dlupi-heteroscedastic-dropout
dlupi-heteroscedastic-dropout copied to clipboard
Unable to recreate the results
Hi John, Thank you for providing the code for a lot of methods. While I was trying to recreate the results for the model dropout_fn_of_xstar, it tries to load the dlupi trained model. If I load the untrained model, the loss diverges and goes to Nan. Could you please look into it. Thank you.
Hi @ck-amrahd , what is the learning rate, batch size, weight decay, and pytorch version you are using?
Hi john, I have set the batch size to 8. I am using pytorch version 1.4, weight decay of 1e-4, and learning rate of 0.01. I tried with lower learning rates like 0.001 and 0.0001, but it doesn't converge either. Do we have start with trained model?
@ck-amrahd , thanks very much for your interest in our work. The batch size will need to be much larger than 8 to converge. And we train all the models from scratch, so no need to start with a trained model.
From the “Implementation Details” section and Supplement: “We use a weight decay of 1 × 10−4 in all experi- ments, ADAM, and a learning rate of 1 × 10−3 , as described in Section 3.2 of the paper. We cropped images to a standard size of 224 × 224 before feeding them into the network. We scale the batch size m with respect to the size of the training set. For example, for the 75K model, we use a batch size of 64. For the 200K Model, we use a batchsize of 128.“
Hi John, I set the parameters as you mentioned, just used batch size of 64 instead of 128 for memory issues. I am training on the imagenet boxes, but even after 19th epoch, the train acc is still around 0.12. Please see the attached figure. Thank you. Do you think it will converge at this rate?
Hi @ck-amrahd , you may need to try training two or three times because sometimes the training is unstable with the heteroscedastic dropout. Was your training successful?
Hi John, This was the accuracy When I trained for one night on 4 GPUs. I terminated the training as the loss doesn't seem to converge. Thank you.