dlupi-heteroscedastic-dropout Unable to recreate the results

Hi John, Thank you for providing the code for a lot of methods. While I was trying to recreate the results for the model dropout_fn_of_xstar, it tries to load the dlupi trained model. If I load the untrained model, the loss diverges and goes to Nan. Could you please look into it. Thank you.

Feb 29 '20 20:02 ck-amrahd

Hi @ck-amrahd , what is the learning rate, batch size, weight decay, and pytorch version you are using?

Mar 01 '20 06:03 johnwlambert

Hi john, I have set the batch size to 8. I am using pytorch version 1.4, weight decay of 1e-4, and learning rate of 0.01. I tried with lower learning rates like 0.001 and 0.0001, but it doesn't converge either. Do we have start with trained model?

Mar 01 '20 19:03 ck-amrahd

@ck-amrahd , thanks very much for your interest in our work. The batch size will need to be much larger than 8 to converge. And we train all the models from scratch, so no need to start with a trained model.

From the “Implementation Details” section and Supplement: “We use a weight decay of 1 × 10−4 in all experi- ments, ADAM, and a learning rate of 1 × 10−3 , as described in Section 3.2 of the paper. We cropped images to a standard size of 224 × 224 before feeding them into the network. We scale the batch size m with respect to the size of the training set. For example, for the 75K model, we use a batch size of 64. For the 200K Model, we use a batchsize of 128.“

Mar 01 '20 22:03 johnwlambert

Hi John, I set the parameters as you mentioned, just used batch size of 64 instead of 128 for memory issues. I am training on the imagenet boxes, but even after 19th epoch, the train acc is still around 0.12. Please see the attached figure. Thank you. Do you think it will converge at this rate?

hdropout

Mar 03 '20 02:03 ck-amrahd

Hi @ck-amrahd , you may need to try training two or three times because sometimes the training is unstable with the heteroscedastic dropout. Was your training successful?

Mar 04 '20 15:03 johnwlambert

Hi John, This was the accuracy When I trained for one night on 4 GPUs. I terminated the training as the loss doesn't seem to converge. Thank you.

Mar 05 '20 03:03 ck-amrahd

dlupi-heteroscedastic-dropout dlupi-heteroscedastic-dropout copied to clipboard

Unable to recreate the results

dlupi-heteroscedastic-dropout
dlupi-heteroscedastic-dropout copied to clipboard