ASNDepth Training Memory Requirements

I'm trying to replicate these results, however I'm running into memory limitations with similar hardware reported in the paper.

Did you guys use a minibatch of 8 per GPU (effective minibatch size 32 across 4 GPUs) or minibatch size of 2 per GPU (effective minibatch size 8 across 4 GPUs)
Did you freeze the backbone or any other sections of the network during training?
In your Depth2normalLight implementation, did you iterate over random samples, or unroll them over an additional dimension in your tensors?
In your Depth2normalLight implementation, did you randomly bilinearly sample your guidance featuremap, or did you simply select pixels from the discrete neighbor set?

Thanks!

Jul 13 '21 15:07 BrianPugh

Thanks for your interests.

If use HRNET-48, effective minibatch size 8 across 4 GPUs.
We don't freeze any module.
Unroll them over an additional dimension in your tensors
Simply select pixels from the discrete neighbor set.

Jul 15 '21 14:07 flamehaze1115

I've got another question: I see that you guys use 3 channels for your guidance feature, and that you use a sigmoid activation. This gives a max L2 distance of 1.73 between two features. The first part of equation 2 in your paper says you use a normalize gaussian kernel to encode into a range [0,1]. However, since the max L2 distance is 1.73, the actual range is only [0.42, 1]. Is this correct?

Jul 15 '21 17:07 BrianPugh

also, how/when do you start applying your learning rate decay? Since the backbone/depth-decoder train for longer than the guidance-decoder.

Jul 15 '21 18:07 BrianPugh

do you apply any regularization to the guidance output?

Jul 15 '21 18:07 BrianPugh

We use 3 channels for better visualization. You can adopt more channels here, while in our experiences, 3 channels are enough. The sigma of Gaussian kernel is a hyperparameter, we just simply adopt 0.5 here.
After only training depth branch in 20 epochs, we start to train the whole with initial lr=0.00005. You may adjust the learning rate.
No regularization to the guidance output.

Jul 19 '21 14:07 flamehaze1115

Hmmm, I'm unable to reproduce the results from your paper. However, my guidance maps don't look like yours either. Have you encountered anything like this in your experiments? The estimated surface normal edge also appears noisy instead of crisp.

individualImage-4 individualImage-7 individualImage-5 individualImage-6