Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard
larger learning rate + large weight decay performs better?
Hi all, My colleague and I tried a combination of (relatively) large Ranger learning rate (say, 0.001) + large weight decay (say, 0.1). Seems the large decay leads to better performance? We tried two different models, and observed 0.5-1.5% increase of ImageNet classification accuracy, but both models were customized models, and not standard ones like Resnet. Not sure whether anyone else finds similar results.