dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Question about binning for depth prediction (DINOv2 + DPT Head)

Open smj007 opened this issue 9 months ago • 0 comments

Hi,

Thanks for your work, it's truly remarkable. I noticed that while training/running inference using the DPT head, binning is not supported by default.

https://github.com/facebookresearch/dinov2/blob/6a6261546c3357f2c243a60cfafa6607f84efcb7/dinov2/eval/depth/models/decode_heads/decode_head.py#L157: self.classify is set to False. I do get reasonable results using the same:

image

I want to get a probability distribution for depth over 256 bins. To do this, I set

self.classify = True self.n_bins = 256

Using these settings, all my logits are set to 1, and get a black image for the rendered depth.

image

Since logit has only 1 channel, the below code divides all spatial values in logit by itself: https://github.com/facebookresearch/dinov2/blob/6a6261546c3357f2c243a60cfafa6607f84efcb7/dinov2/eval/depth/models/decode_heads/decode_head.py#L170

As a result, in the contraction (sum of product) step below, all values in output are the same: https://github.com/facebookresearch/dinov2/blob/6a6261546c3357f2c243a60cfafa6607f84efcb7/dinov2/eval/depth/models/decode_heads/decode_head.py#L177

I am using the render_depth function (https://github.com/facebookresearch/dinov2/blob/main/notebooks/depth_estimation.ipynb) to visualize the depth maps.

Am I doing something wrong here, or are there any other steps I need to be doing to fix this? Is it reasonable to expect a similar depth map when self.classify is True? Thank you!

smj007 avatar Sep 25 '23 03:09 smj007