Courses/Computer Vision

Computer Vision

CSE471

Prof. Makarand Tapaswi + Prof. Charu Sharma•Spring 2025-26•4 credits

Fill in the Blanks

Formulas, keywords, theorem statements.

Marr's one-line definition of vision is: 'to know ____ , by looking.'

The Three Rs of CV are Reorganisation, ____, and Reconstruction.

Smoothing kernels sum to 1; derivative (edge) kernels sum to ____.

The Gaussian filter is SEPARABLE → its 2D application costs $O (2 K)$ instead of $O(____)$ .

The Hough transform parameterises lines in polar form as $ρ = x cos θ + y sin θ$ to avoid the ____ problem of slope-intercept form.

The logistic-regression SGD update is $w \leftarrow w - \eta\,(____)\,x$ .

Kaiming init draws $w \sim \mathcal{N}(0, \, ____ \, / n_{\text{in}})$ to compensate for ReLU's half-zeroing.

F1 = 2PR/____ (harmonic mean of Precision and Recall).

Conv-layer parameters = $F \cdot (C_{\text{in}} \cdot ____ + 1)$ .

Receptive field of $L$ stacked $3 \times 3$ stride-1 convs = ____.

ResNet's residual gradient is $\partial y/\partial x = \partial F/\partial x + ____$ — the term that prevents vanishing.

GIoU adds a penalty term proportional to the area of ____ that lies outside (A ∪ B).

Focal loss multiplies cross-entropy by (1 − p_t)^γ with γ ≈ ____.

Dice can be rewritten in terms of IoU as Dice = ____.

PCKh@0.5 normalises the distance threshold by ____.

PointNet's aggregation function is ____, chosen because it is symmetric (permutation invariant).

In 3DGS, the covariance is parameterised as Σ = ____.

Scaled dot-product attention divides QKᵀ by ____ to keep softmax in its non-saturated regime.

ViT-B/16 has approximately ____ million parameters.

BYOL prevents collapse via ____ on the target network plus a predictor head on the online network.

MAE masks ____% of patches, far more than BERT's 15%, because images are spatially redundant.

RoPE rotates the (q_{2i}, q_{2i+1}) pair by an angle proportional to ____.

In PaliGemma's Prefix-LM masking, image and prompt tokens use ____ attention while the answer suffix uses causal attention.

I3D inflates a 2D K×K filter into a 3D K×K×K filter by replicating along time and dividing by ____.