Ranger-Deep-Learning-Optimizer
Ranger-Deep-Learning-Optimizer copied to clipboard
Is adabelief the best optimizer?
https://paperswithcode.com/paper/adabelief-optimizer-adapting-stepsizes-by-the
"This work considers the update step in first-order methods. Other directions include Lookahead [42] which updates “fast” and “slow” weights separately, and is a wrapper that can combine with other optimizers; variance reduction methods [43, 44, 45] which reduce the variance in gradient; and LARS [46] which uses a layer-wise learning rate scaling. AdaBelief can be combined with these methods. Other variants of Adam have been proposed (e.g. NosAdam [47], Sadam [48] and Adax [49])."
I tested adabelief on my task, it is worse than ranger.
@hiyyg Could you post your task, network, and hyper-params of two optimizers for your task?
It was an internal task, sorry I can not share it. The hyper params are all the default for both optimizers.
@hiyyg which version of adabelief did you use? Not sure if it's caused by eps, quickly skimming over the ranger code, default uses eps=1e-5, equivalent to eps=1e-10 for AdaBelief. The most recent (0.2) default eps is 1e-16 for AdaBelief, equivalent to an eps=1e-8 for Adam. The difference in eps is crucial for adaptive optimizers, this could be the reason causing the performance difference.
Thanks. I guess I used the version around 28 Dec 2020. I think your information might be very useful for users who want to compare Adabelief with Ranger.
Thanks for the info. 28 Dec 2020 is about v0.1 and the default eps=1e-16 for AdaBelief