Depth-Anything About Table 6 in the Original Paper

Thanks for your excellent and outstanding efforts in this work! Really love the work.

As for the transferring performance experiments in Table 6, I have some questions about the detailed training settings:

Is either ViTS, ViTB or ViTL used for this experiment?
Is this model trained from scratch (using DinoV2 original weights and finetuning on speific single dataset) in Table 6, or trained by self-supervised manner on the whole labeled 1.5M and 65M unlabeled dataset? In other words, is there self-supervision manner involved in this experiment settings?

THX a lot for your kind reply and description.

Jan 24 '24 12:01 zhangzw12319

Thank you for your questions.

ViT-L is used.
The model is fine-tuned from DINOv2 weights, without further performing SSL on our 1.5M + 62M depth data.

Jan 25 '24 05:01 LiheYoung

Thx for your kind information !

Jan 26 '24 03:01 zhangzw12319