Depth-Anything icon indicating copy to clipboard operation
Depth-Anything copied to clipboard

Depth-Anything Small encoder for metric depth estimation?

Open Denny-kef opened this issue 1 year ago • 7 comments

What steps would you recommend taking for using the small depth anything relative depth estimation model with the metric depth estimation pipeline? Will it need to be re-trained or should I be able to swap them and it still work somewhat correctly?

Denny-kef avatar Feb 08 '24 20:02 Denny-kef

I finetuned a metric depth model on custom dataset. From my tinkering with depth anything codebase:

To switch encoder: You need to modify build function as shown below:

https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/metric_depth/zoedepth/models/base_models/depth_anything.py#L334

Replace depth_anything = DPT_DINOv2(encoder='vitl', out_channels=[256, 512, 1024, 1024], use_clstoken=False) By depth_anything = DPT_DINOv2(encoder='vits', features=64, out_channels=[48, 96, 192, 384], use_clstoken=False)

On the dataset side:

You need to accommodate your dataset to follow the preprocessing and augmentation done in https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/metric_depth/zoedepth/data/data_mono.py#L292

If your depth GT needs custom scaling as shown below then apply such scaling and you should be good to go. https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/metric_depth/zoedepth/data/data_mono.py#L353

mvish7 avatar Feb 13 '24 09:02 mvish7

@mvish7 Can you just swap out the encoder and not finetune the model and it still work?

Denny-kef avatar Feb 15 '24 15:02 Denny-kef

Hi As the base DepthAnything model is trained for relative depth, just swapping the encoder won't produce metric depth. From my experience within 3 epochs of fine tuning the model has learned the depth scale of our custom dataset.

mvish7 avatar Feb 15 '24 15:02 mvish7

@mvish7 Hi, I want to ask how many pairs of rgb-depth have you used for fintuning. I have used 100 pairs of my own data and trained 50 epochs, but the result seems not accurate. I also want to know if the parameter --pretrained_resource="" in train_mono.py is depth_anything_metric_depth_outdoor.pt? If I use depth_anything_vitl14.pth, there will be a mismatch error in state_dict.

cosmosmosco avatar Feb 27 '24 06:02 cosmosmosco

Hi @cosmosmosco, the argument --pretrained_resource="" is for loading a pre-trained checkpoint (containing entire model parameters). Please just set it as an empty string when launching your training script.

LiheYoung avatar Mar 17 '24 10:03 LiheYoung

Hi @cosmosmosco, the argument --pretrained_resource="" is for loading a pre-trained checkpoint (containing entire model parameters). Please just set it as an empty string when launching your training script.

Thank you. It really helps.

cosmosmosco avatar Mar 18 '24 08:03 cosmosmosco

@mvish7 @cosmosmosco I am trying to finetune the zoedepth+depthanything model on a custom outdoor dataset for which I do have the pixel-wise gt. Can you please throw some light on data preparation, which config to use, and how best to use the script along with the changes in the script and the specific arguments for a custom dataset? Thanks in advance!

abhishek0696 avatar Mar 25 '24 20:03 abhishek0696