Ranger-Deep-Learning-Optimizer icon indicating copy to clipboard operation
Ranger-Deep-Learning-Optimizer copied to clipboard

larger learning rate + large weight decay performs better?

Open askerlee opened this issue 5 years ago • 0 comments

Hi all, My colleague and I tried a combination of (relatively) large Ranger learning rate (say, 0.001) + large weight decay (say, 0.1). Seems the large decay leads to better performance? We tried two different models, and observed 0.5-1.5% increase of ImageNet classification accuracy, but both models were customized models, and not standard ones like Resnet. Not sure whether anyone else finds similar results.

askerlee avatar Oct 28 '19 04:10 askerlee