Computer Vision
CSE471Fill in the Blanks
Formulas, keywords, theorem statements.
Marr's one-line definition of vision is: 'to know ____ , by looking.'
The Three Rs of CV are Reorganisation, ____, and Reconstruction.
Smoothing kernels sum to 1; derivative (edge) kernels sum to ____.
The Gaussian filter is SEPARABLE → its 2D application costs instead of O(____).
The Hough transform parameterises lines in polar form as to avoid the ____ problem of slope-intercept form.
The logistic-regression SGD update is w \leftarrow w - \eta\,(____)\,x.
Kaiming init draws w \sim \mathcal{N}(0, \, ____ \, / n_{\text{in}}) to compensate for ReLU's half-zeroing.
F1 = 2PR/____ (harmonic mean of Precision and Recall).
Conv-layer parameters = F \cdot (C_{\text{in}} \cdot ____ + 1).
Receptive field of stacked stride-1 convs = ____.
ResNet's residual gradient is \partial y/\partial x = \partial F/\partial x + ____ — the term that prevents vanishing.
GIoU adds a penalty term proportional to the area of ____ that lies outside (A ∪ B).
Focal loss multiplies cross-entropy by (1 − p_t)^γ with γ ≈ ____.
Dice can be rewritten in terms of IoU as Dice = ____.
PCKh@0.5 normalises the distance threshold by ____.
PointNet's aggregation function is ____, chosen because it is symmetric (permutation invariant).
In 3DGS, the covariance is parameterised as Σ = ____.
Scaled dot-product attention divides QKᵀ by ____ to keep softmax in its non-saturated regime.
ViT-B/16 has approximately ____ million parameters.
BYOL prevents collapse via ____ on the target network plus a predictor head on the online network.
MAE masks ____% of patches, far more than BERT's 15%, because images are spatially redundant.
RoPE rotates the (q_{2i}, q_{2i+1}) pair by an angle proportional to ____.
In PaliGemma's Prefix-LM masking, image and prompt tokens use ____ attention while the answer suffix uses causal attention.
I3D inflates a 2D K×K filter into a 3D K×K×K filter by replicating along time and dividing by ____.