Depth-Anything Does anybody achieve the metric depth estimation on a custom dataset successfully?

Hi,

This post is just to discuss how to achieve metric depth estimation on a custom dataset, like I am using SCARED dataset. If anyone successfully fine-tune the model and achieve metric depth estimation, could you tell me which code did you modify?

Feb 01 '24 17:02 MichaelWangGo

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Feb 03 '24 10:02 1ssb

@1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well!

Feb 05 '24 21:02 Denny-kef

Sorry @Denny-kef, but I cannot help you with that. Make sure you are using the outdoor model and not the indoor one.

Skies are always difficult to correctly capture on an absolute scale so I do not think your expectations can be too high for that. Relative scaling of distant backgrounds are also always better than absolute ones in the history of monocular depth estimation.

On Tue, 6 Feb, 2024, 8:48 am Dennis Loevlie, @.***> wrote:

@1ssb https://github.com/1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well! image.png (view on web) https://github.com/LiheYoung/Depth-Anything/assets/121886500/599554a8-ebb2-462c-843d-69c627a9d04a

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/68#issuecomment-1928154256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDZIZMLRDJWSUSRMI3YSFHSVAVCNFSM6AAAAABCVJF6ACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGE2TIMRVGY . You are receiving this because you were mentioned.Message ID: @.***>

Feb 05 '24 21:02 1ssb

Hi @1ssb thanks for getting back to me! I am using the outdoor checkpoints and just wondering if you (or anyone else) has seen similar results with the metric depth predictions?

Feb 05 '24 21:02 Denny-kef

That's interesting about the metric vs relative thing. I mean I could mask out the background using the relative depth network or a lightweight segmentation network but it seems like there should be a better way..

Feb 05 '24 22:02 Denny-kef

Oh this is interesting @Denny-kef I took a look at your image and it seems to me it is captured by either a fish eye lens or the image itself is a bit distorted as in to the eye itself it looks like the clouds are much closer than they actually are, this is a very cool thing as well. I am not sure I have an exact answer to this but it might as well be an OOD.

Feb 05 '24 22:02 1ssb

@Denny-kef Idk if this can help you in your task, but this is my personal interpretation: authors report on the paper that they set the disparity value (inverse of depth) to 0 for all the pixels labeled as "sky" by a semantic segmentation model (see section 3.1 of the paper on arxiv). I haven't seen the implementation details of their code, but this can affect their training in different ways:

if the disparity value of 0 is considered as invalid and masked during training, it means that the MDE model is not trained to decode the original disparity/depth value for the sky
even if the disparity value of 0 is considered as "valid" and not masked to the model during training, the MDE model sees all the sky pixels as having the same disparity value (that is 0), so the learned features for the sky pixels would not be expressive enough to eventually decode other values

That said, I think the features from their frozen encoder are really powerful for metric depth estimation, but I guess it would very hard to use them to produce correct values on an absolute scale for the sky. In relative depth estimation, the overall qualitative goodness of the sky predictions could come from the semantic feature alignment done during training (see section 3.3)

Feb 06 '24 10:02 pestrstr

The best solution that I found for the "background" issue with metric depth estimation predictions is this:

Retrieve the relative depth map as a secondary output from the metric depth estimation model.
Using that depth map (since it is much better at predicting the background) I was able to generate a binary mask to eliminate things like the sky from my metric depth results.

Importantly, these operations don't add any significant time to the inference.

Feb 08 '24 20:02 loevlie

Bottomline: Don't try to predict skies or reflections.

Feb 08 '24 22:02 1ssb

I was not trying to predict skies but I was trying to remove them from the outputted depth map so they don't show up in the point cloud. But yes do not try to predict the depth of the sky or reflections!

Feb 08 '24 23:02 loevlie

Hi @loevlie, if you are trying to detect the sky and remove it, you can try our relative depth models. The output value 0 from these models can be considered as the sky (or extremely far). Alternatively, you can use a pre-trained semantic segmentation model to detect the sky.

Feb 09 '24 03:02 LiheYoung

Hi @LiheYoung, yes that works very well! Thank you!

Feb 09 '24 12:02 loevlie

@1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well!

Could you kindly share with me the parameters you adjusted during fine-tuning? I've been encountering poor performance in my experiments with another dataset, and I've been struggling to resolve the issue. The details of the problem are as follows.https://github.com/LiheYoung/Depth-Anything/issues/172#issue-2292062398

May 13 '24 07:05 xiaobh1519

@Denny-kef Hi Denny, would you be able to explain how you got the metric depth working and your output depth images? I've been trying to run the metric outdoor model on my custom dataset but have been running into a lot of issues. Any help would be greatly appreciated!

May 17 '24 21:05 andrewhbradley9

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Hi @1ssb ,

Are you able to train metric depth estimation on a dataset without depth maps (labels) ? Could you please share more details about your training trial?

Jun 27 '24 08:06 shilpaullas97

Hi, I never train, but test time fine tune using a plug and play method on the RealEstate10k. Kindly remember that Realestate10k is not RGBD dataset, but you can use the triangulation method and use the control points to rescale the predictions directly. It's not very neat but it's the best you can do without retraining from scratch.

Best Subhransu

From: Shilpa Ullas @.> Sent: Thursday, June 27, 2024 6:04:17 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] Does anybody achieve the metric depth estimation on a custom dataset successfully? (Issue #68)

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Hi @1ssbhttps://github.com/1ssb ,

Are you able to train metric depth estimation on a dataset without depth maps (labels) ? Could you please share more details about your training trial?

— Reply to this email directly, view it on GitHubhttps://github.com/LiheYoung/Depth-Anything/issues/68#issuecomment-2194051505, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEH236HROYVD4WEG3DTZJPBQDAVCNFSM6AAAAABCVJF6ACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGA2TCNJQGU. You are receiving this because you were mentioned.Message ID: @.***>

Jun 27 '24 08:06 1ssb

Hi, I never train, but test time fine tune using a plug and play method on the RealEstate10k. Kindly remember that Realestate10k is not RGBD dataset, but you can use the triangulation method and use the control points to rescale the predictions directly. It's not very neat but it's the best you can do without retraining from scratch. Best Subhransu … ________________________________ From: Shilpa Ullas @.> Sent: Thursday, June 27, 2024 6:04:17 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] Does anybody achieve the metric depth estimation on a custom dataset successfully? (Issue #68) I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good. Hi @1ssbhttps://github.com/1ssb , Are you able to train metric depth estimation on a dataset without depth maps (labels) ? Could you please share more details about your training trial? — Reply to this email directly, view it on GitHub<#68 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEH236HROYVD4WEG3DTZJPBQDAVCNFSM6AAAAABCVJF6ACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGA2TCNJQGU. You are receiving this because you were mentioned.Message ID: @.***>

Hi @1ssb ,

Could you please share more details about how you finetune at test time? Many thanks!

Jul 25 '24 20:07 callmeray

Depth-Anything Depth-Anything copied to clipboard

Does anybody achieve the metric depth estimation on a custom dataset successfully?

Depth-Anything
Depth-Anything copied to clipboard