DPT icon indicating copy to clipboard operation
DPT copied to clipboard

Finetune on Smaller Dataset

Open nehamjain10 opened this issue 3 years ago • 7 comments

Hey. I wanted to finetune the DPT-Large model on my smaller dataset. Is the code for finetuning available or will it be released with the training code?

nehamjain10 avatar Jun 04 '21 13:06 nehamjain10

Yes, can you please release the training code. It would be very useful to the research community.

GopiRajuMatta avatar Jun 05 '21 04:06 GopiRajuMatta

Here are something I like to share: These are from DPT Paper image image These are from Midas paper. image

I am thinking maybe it is not a good fit to predict absolute metric value for this model. because you don't know the shift and scale for any new image. and author used average scale and shift cross the training set @ranftlr if I am wrong please correct me

Let me illustrate this by an example. consider I have two images. image one: The min and max ground truth depths are 0 meter to 1 meter scale image two: The min and max ground truth depth are 0 meter to 10 meter scale suppose model is perfect correct, the model predict the value is min and max depth 0 and 1.

You just can not get absolute metric value because you have no idea what is ground truth scale and shift for image one and two.

I wish anyone who disagree me, please correct me.

angrysword avatar Jun 17 '21 18:06 angrysword

@angrysword I have arrived to the same conclusion trying to train the model on a new dataset. I hope they did open-source the training pipeline but for now will have to use what I came up with.

yassineAlouini avatar Jan 20 '22 17:01 yassineAlouini

@yassineAlouini any luck? I also need to do the same, if have done it then that would be a great help.
@angrysword, If I understand right, they map the training ground truth to 0-1 by using [30], is that it?

gurkirt avatar Jan 29 '22 12:01 gurkirt

For normalization, I am using an internal scale factor of the camera to move from pixels to meters. Then, I use an estimation of shift and scale factors to align predictions with ground truth values.

Regarding the loss, I am using the one described in paper [12] and an additional spatial gradient term.

I hope this helps a bit.

yassineAlouini avatar Jan 29 '22 14:01 yassineAlouini

@yassineAlouini, thank you for your reply.

I am new to the depth estimation problem, I have these maps for the virtual environment, recorded using VTK (https://github.com/pablospe/render_depthmap_example/blob/main/main.py). Where the camera is a simple pinhole camera with a square image. The depth range is from 0 to 3 centimetres.

Do just divide depth map 3 and that should be it or do I need to take a log or something like that before I compute the loss.

Can you please explain how the alignment process would work here?

gurkirt avatar Feb 03 '22 12:02 gurkirt

@gurkirt since I haven't seen your data or used it, I guess the best way to know is by experimenting.

Regarding the alignment, it is using this code (check the compute_scale_and_shift method) that estimates the scale and shift from the ground truth disparity.

Finally, for the log of the depth, again try with and without and see if it improves your model.

Good luck!

yassineAlouini avatar Feb 03 '22 15:02 yassineAlouini