DPT icon indicating copy to clipboard operation
DPT copied to clipboard

Thanks and is there a plan to release the training code?

Open jucic opened this issue 3 years ago • 18 comments

jucic avatar Apr 01 '21 06:04 jucic

I have the same question. Will you release the training code for reference?

lxtGH avatar Apr 02 '21 05:04 lxtGH

We plan to release the training code. I can't give an exact timeline at the moment, but I hope that we'll be able to do this within one or two months.

ranftlr avatar Apr 03 '21 09:04 ranftlr

Same here. wish release training code. I used the pretrained the model (Midas) to predict depth. It is very impressive. However, one problem I am experiencing when object is a little big far away(2 or 3 meter away). It can tell the difference. I attach 2 pics for your reference im2 im3

angrysword avatar May 09 '21 19:05 angrysword

@ranftlr Wonderful work! Also looking forward to the training code. I'd like to ask how many gpus did you use for training and how long did it take to train the model?

Tord-Zhang avatar May 12 '21 03:05 Tord-Zhang

@Tord-Zhang: We typically train on 4 Quadro 6000 cards that have 24 GB memory each. A complete run to produce the final model takes about 5 days to complete.

@angrysword: Sorry, I don't understand your question. Can you elaborate more on the problem that you are observing?

ranftlr avatar May 12 '21 15:05 ranftlr

@ranftlr Hi, thanks for your quick response. I am a little surprised about the training speed, since the MGDA training algorithm is used, in which a minibatch from each dataset need two forward calculation in each iteration, and the dataset is also very big. Would Quadro 6000 be faster than Tesla V100? BTW, could I ask when the training code would be released? Thanks!

Tord-Zhang avatar May 13 '21 03:05 Tord-Zhang

We don't go through all the images in every "epoch". Since the sizes of individual datasets can differ by an order of magnitude, we use a resampling strategy that assembles mini-batches in equal parts (on average) from every datasets. This also plays well with the diversity of the individual datasets: the large datasets typically have a lot of similar frames as the frames come from videos, whereas the smaller datasets tend to have a lot of uncorrelated images. Based on this, we perform a fixed number of total steps. We define an "epoch" as seeing 72000 samples and train for 2x 60 epochs - once for pre-training, once for the run over the full dataset. Please have a look at the MiDaS paper for more details (https://arxiv.org/abs/1907.01341).

I don't have a comparison to a V100, as I don't have any available.

There is still not exact ETA for training code.

ranftlr avatar May 13 '21 10:05 ranftlr

@ranftlr Did you mean sampling equal numer of images from different dataset in each batch? or sample different number of images but the same percentage? About the training, did you use DP or DDP when training with MGDA algorithm? I am not sure does MGDA support DDP.

Tord-Zhang avatar May 14 '21 06:05 Tord-Zhang

@ranftlr And which version of Blended MVS did you use? BlendedMVS, BlendedMVS+ or BlendedMVS++? High resolution or low resolution? I found that there are some unpleasant noise point in the groundtruth of low resolution blendedmvs, how did you deal with those points? Thanks.

Tord-Zhang avatar May 17 '21 09:05 Tord-Zhang

Hello @ranftlr , still no ETA for the training code? We're considering writing our own, but that'd be undesirable especially if you are planning to release your original code.

eliabruni avatar Jun 08 '21 17:06 eliabruni

Hi @ranftlr, thanks for your impressive work, I would also like to mention that we at KU Leuven's PSI lab are looking forward to the training code, otherwise we will need to write our own code too, so if you could give an indication of the time you'd release the code, we will have a better idea how to organize our work.

soroushseifi avatar Jun 17 '21 15:06 soroushseifi

@ranftlr Hi, still no ETA for training code?

Tord-Zhang avatar Jul 20 '21 12:07 Tord-Zhang

@eliabruni @soroushseifi @Tord-Zhang

Any luck writing the training code? I am not sure how laborious it would be but I am thinking about writing it. If no one has tried to do it, does anyone want to help? I want to try and fine-tune the model on NYU dataset but edges only and see how well this thing can estimate depth on line images.

chrisdottel avatar Oct 11 '21 20:10 chrisdottel

@chrisdottel maybe you can have a look at https://github.com/open-mmlab/mmsegmentation/tree/master/configs/dpt (there is a training schedule in dpt_vit-b16_512x512_160k_ade20k.py)

eliabruni avatar Oct 12 '21 07:10 eliabruni

Did someone manage to do transfer learning using the DPT large or hybride pretrained models and using another depth dataset? Is the loss described in this paper a good one to use for this transfer learning?

vns_transfer_learning_loss

Is unfreezing the last layer enough or should we unfreeze more?

Any pointers/tips are welcome, thanks in advance. :slightly_smiling_face:

yassineAlouini avatar Dec 23 '21 10:12 yassineAlouini

Hi, we have re-implemented the paper presented in this repository, and we have added a training script. Check it out here: https://github.com/antocad/FocusOnDepth

antocad avatar Feb 04 '22 11:02 antocad

@antocad Thanks for this. :ok_hand:

yassineAlouini avatar Feb 04 '22 15:02 yassineAlouini

So no plan for releasing training code?

Tord-Zhang avatar Jun 27 '22 13:06 Tord-Zhang