Error in depth gradient loss?

Open JulienGaubil opened this issue 1 month ago • 0 comments

Hi Mohamed, thanks for your great work!

I think there is a mistake inherited from DINOv2 in the gradient loss used for depth probing in your project about probing the 3D awareness of vision models. This is located at: https://github.com/mbanani/probe3d/blob/c52d00b069d949b2f00c544d4991716df68d5233/evals/utils/losses.py#L89

I believe the loss aims at downsampling and taking gradients along height and width dimensions, that are the last two dimensions. However, in the current version of the code the downsampling is done one the first two dimensions (batch and single-channel depth dimension).

depth_pr_downscaled = [depth_pr] + [ depth_pr[:: 2 * i, :: 2 * i] for i in range(1, 4) ] depth_gt_downscaled = [depth_gt] + [ depth_gt[:: 2 * i, :: 2 * i] for i in range(1, 4) ]

(and so on for the rest of the function). However, the depth tensors are of shape (batch channel=1 height width). I believe the correct dowsampling and gradient computations should instead look like:

depth_pr_downscaled = [depth_pr] + [ depth_pr[..., :: 2 * i, :: 2 * i] for i in range(1, 4) ] depth_gt_downscaled = [depth_gt] + [ depth_gt[..., :: 2 * i, :: 2 * i] for i in range(1, 4) ]

I went back to DINOv2’s codebase and it matches their implementation. However, they may have rearranged their input tensors so that height and width would be the first dims - although that would be unconventional. https://github.com/facebookresearch/dinov2/blob/main/dinov2/eval/depth/models/losses/gradientloss.py

Nov 03 '25 14:11 JulienGaubil