AdamP-Tensorflow
AdamP-Tensorflow copied to clipboard
Tensorflow Implementation of "Slowing Down the Weight Norm Increase in Momentum-based Optimizers"
AdamP Optimizer — Unofficial TensorFlow Implementation
"Slowing Down the Weight Norm Increase in Momentum-based Optimizers"
Implemented by Junho Kim
[Paper] [Project page] [Official Pytorch]
Validation
I have checked that the code is working, but I couldn't confirm if the performance is the same as the offical code.
Usage
Usage is exactly same as tf.keras.optimizers library!
from adamp_tf import AdamP
from sgdp_tf import SGDP
optimizer_adamp = AdamP(learning_rate=0.001, beta_1=0.9, beta_2=0.999, weight_decay=1e-2)
optimizer_sgdp = SGDP(learning_rate=0.1, weight_decay=1e-5, momentum=0.9, nesterov=True)
- Do not use with
tf.nn.scale_regularization_loss. Use theweight_decayargument.
Arguments
SGDP and AdamP share arguments with tf.keras.optimizers.SGD and tf.keras.optimizers.Adam.
There are two additional hyperparameters; we recommend using the default values.
delta: threhold that determines whether a set of parameters is scale invariant or not (default: 0.1)wd_ratio: relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)
Both SGDP and AdamP support Nesterov momentum.
nesterov: enables Nesterov momentum (default: False)
How to cite
@article{heo2020adamp,
title={Slowing Down the Weight Norm Increase in Momentum-based Optimizers},
author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Uh, Youngjung and Ha, Jung-Woo},
year={2020},
journal={arXiv preprint arXiv:2006.08217},
}