Adam-experiments
Adam-experiments copied to clipboard
How to choose wd?
Thankyou for this wonderful benchmarking.
In several experiments wd=1.2e-6
. Can you please give some guidelines or rule of thumb in choosing the hyperparameter for weight decay?
@MohitLamba94
Any update?
@MohitLamba94
Any update?
Sorry. I did not look into into any further.