ASNDepth icon indicating copy to clipboard operation
ASNDepth copied to clipboard

Training Memory Requirements

Open BrianPugh opened this issue 3 years ago • 9 comments

I'm trying to replicate these results, however I'm running into memory limitations with similar hardware reported in the paper.

  1. Did you guys use a minibatch of 8 per GPU (effective minibatch size 32 across 4 GPUs) or minibatch size of 2 per GPU (effective minibatch size 8 across 4 GPUs)
  2. Did you freeze the backbone or any other sections of the network during training?
  3. In your Depth2normalLight implementation, did you iterate over random samples, or unroll them over an additional dimension in your tensors?
  4. In your Depth2normalLight implementation, did you randomly bilinearly sample your guidance featuremap, or did you simply select pixels from the discrete neighbor set?

Thanks!

BrianPugh avatar Jul 13 '21 15:07 BrianPugh

Thanks for your interests.

  1. If use HRNET-48, effective minibatch size 8 across 4 GPUs.
  2. We don't freeze any module.
  3. Unroll them over an additional dimension in your tensors
  4. Simply select pixels from the discrete neighbor set.

flamehaze1115 avatar Jul 15 '21 14:07 flamehaze1115

I've got another question: I see that you guys use 3 channels for your guidance feature, and that you use a sigmoid activation. This gives a max L2 distance of 1.73 between two features. The first part of equation 2 in your paper says you use a normalize gaussian kernel to encode into a range [0,1]. However, since the max L2 distance is 1.73, the actual range is only [0.42, 1]. Is this correct?

BrianPugh avatar Jul 15 '21 17:07 BrianPugh

also, how/when do you start applying your learning rate decay? Since the backbone/depth-decoder train for longer than the guidance-decoder.

BrianPugh avatar Jul 15 '21 18:07 BrianPugh

do you apply any regularization to the guidance output?

BrianPugh avatar Jul 15 '21 18:07 BrianPugh

  1. We use 3 channels for better visualization. You can adopt more channels here, while in our experiences, 3 channels are enough. The sigma of Gaussian kernel is a hyperparameter, we just simply adopt 0.5 here.
  2. After only training depth branch in 20 epochs, we start to train the whole with initial lr=0.00005. You may adjust the learning rate.
  3. No regularization to the guidance output.

flamehaze1115 avatar Jul 19 '21 14:07 flamehaze1115

Hmmm, I'm unable to reproduce the results from your paper. However, my guidance maps don't look like yours either. Have you encountered anything like this in your experiments? The estimated surface normal edge also appears noisy instead of crisp.

individualImage-4 individualImage-7 individualImage-5 individualImage-6

BrianPugh avatar Jul 23 '21 00:07 BrianPugh

Hello. I plan to release the codes in the near future.

flamehaze1115 avatar Jul 23 '21 01:07 flamehaze1115

Hello.Can you release the training/evaling codes please?

sysoda avatar Mar 25 '22 12:03 sysoda

@flamehaze1115 Any plan to release the training code ?

phongnhhn92 avatar Feb 19 '24 06:02 phongnhhn92