RAdam-Tensorflow
RAdam-Tensorflow copied to clipboard
Simple Tensorflow implementation of "On The Variance Of The Adaptive Learning Rate And Beyond"
RAdam-Tensorflow
On the Variance of the Adaptive Learning Rate and Beyond
Paper | Official Pytorch code
Usage
from RAdam import RAdamOptimizer
train_op = RAdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, weight_decay=0.0).minimize(loss)
Algorithm
Result
from RAdam import RAdamOptimizer
train_op = RAdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, weight_decay=0.0).minimize(loss)
data:image/s3,"s3://crabby-images/cf569/cf5690d93629b3c8941712723489abef535b268b" alt=""