Adabelief-Optimizer
Adabelief-Optimizer copied to clipboard
Implementation of pure keras
Do you have pure keras implementation version? Thanks
@liaoxuanzhi We don't have pure implementations in Keras yet. Do you know any pure Keras implementation of Adam or RAdam? It would be easier to start from some existing implementations.
sorry for the late reply. i will implement this algorithm by pure keras and feedback to you as soon as possible.
i am training the pure keras version of adabelief algorithm using the small resnet18(decrease the kernel size by 2) on cifar10. according to my monitoring, from 0-36 epoch, it hits 81.9%,which is better than adam(0-36 epoch, best is 80.58%).model is still trainning, and i will update to you the final result soon( see the lr decrease*0.1 at 150 epoch).
Cool, thanks a lot! Please create a new pull request when you finish the code.
sorry for late report. i have done the implementation of pure keras version(keras =2.2.5,tensorflow=1.14.0). please check this link: this is the initial idea of your work(without any tricks like Rectified,decouple weight_decay etc..) for using it(please choose the parameter recommended ): from AdaBelief import AdaBelief opt = AdaBelief(lr=0.001, beta_1=0.9, beta_2=0.999,epsilon=1e-08, decay=0., weight_decay=0.0)
ps: pytorch uses weight_decay in optimizers while the keras uses it L2 in every trainable layers.for my implementation, i construct a small nerual network like LENET5, and train on cifar10. adabelief is always better and stable than adam. the average result in five runs is 82.7%,while the adam is like 82%.
Cool, thanks so much! We could create a new pull request and merge your code, and perhaps push to pip once we figure out how to add features such as rectify.