Depth-Anything Metric depth option `force_keep_ar=True` produces slightly better results

Hi! When I was testing metric depth estimation on a few casual images (of size 512x512), I noticed that the option force_keep_ar (keep_aspect_ratio) is false by default, which is not consistent with ZeoDepth. Then I set keep_aspect_ratio to True and found many cases slightly sharper and clearer. Here are two examples. Input: input image ' keep_ar=False keep_ar=True

Input: keep_ar=False keep_ar=True _keep_ar=True

I am wondering if the default setting keep_aspect_ratio=False is expected, because it could significantly change the aspect ratio of input images when their sizes does not match 518:392. Is it recommanded to set keep_aspect_ratio as true when testing on images of general size?

Feb 07 '24 09:02 EasternJournalist

Do you mean the line below? It seems we have already set it as True. https://github.com/LiheYoung/Depth-Anything/blob/f419b7db90b26b2855280c4da484778c4fac759f/metric_depth/zoedepth/models/zoedepth/config_zoedepth.json#L50

Feb 07 '24 09:02 LiheYoung

Do you mean the line below? It seems we have already set it as True.

https://github.com/LiheYoung/Depth-Anything/blob/f419b7db90b26b2855280c4da484778c4fac759f/metric_depth/zoedepth/models/zoedepth/config_zoedepth.json#L50

Oh, I see. It seems that infer mode has force_keep_ar set as true, while eval mode is not so. And I was led by the code in evaluate.py and depth_to_point_cloud.py, both of which use eval mode.

https://github.com/LiheYoung/Depth-Anything/blob/f419b7db90b26b2855280c4da484778c4fac759f/metric_depth/depth_to_pointcloud.py#L67

https://github.com/LiheYoung/Depth-Anything/blob/f419b7db90b26b2855280c4da484778c4fac759f/metric_depth/evaluate.py#L127

Thanks for your reply!

Feb 07 '24 10:02 EasternJournalist

Oh, I am a little confused now, hhh. Are our current configurations same as ZoeDepth? I did not mean to change the ZoeDepth configs for fair comparison.

Feb 07 '24 10:02 LiheYoung

Sorry that I didn't make it clear. I just checked the original repo. The evaluation configuration of ZoeDepth in the original repo is the same with depth-anything (with force_keep_ar=false in eval mode and true in infer mode). And their choice of mode for evaluation is "eval", the same as yours. So the comparison is fair, anyway : ).

When I said ZeoDepth's configuration is not consistent, I was using the model loaded from torchhub, which has force_keep_ar set as true. It might be initialized with infer mode.

So I guess the conclusion is that the comparison is undoubtably fair. You both evaluate with eval mode and force_keep_ar=False. But it looks that force_keep_ar=True is a better choice for inference in practice, and it is also the default setting from torchhub. (This is not quantitatively tested. Maybe you can re-evaluate both methods in "infer" mode and probably report even better numbers? Or maybe not... I don't know why there is a differ in eval and infer modes, but force_keep_ar=true does look better to me)

Feb 07 '24 11:02 EasternJournalist

Thank you for your very clear and comprehensive explanations!

I also agree that for a totally wild image, keeping the original aspect ratio would be a better choice (supported in your examples, and indeed our foundation Depth Anything models for relative MDE also keeps the ratio).

ZoeDepth sets force_keep_ar=false when evaluating standard benchmarks, perhaps because this can bring slightly better results. If I remember correctly, I also experienced that resizing KITTI images with the same width and height even brings better results than the original aspect ratio during evaluation. It is somewhat surprising and strange.

Anyway, thank you a lot for your attention on our work and your valuable information!

Feb 07 '24 12:02 LiheYoung