RAdam-Tensorflow
RAdam-Tensorflow copied to clipboard
Simple Tensorflow implementation of "On The Variance Of The Adaptive Learning Rate And Beyond"
When I use RAdam in estimator, I encounter 'NaN loss during training' problem. However Adma works fine. 
Hi, Kim. I am also a developer working in the same field. I'm developing in a tf2.0 environment and I wonder if this code will work in this environment either....
In the algorithm outlined in the original paper, the threshold for whether adapted momentum is applied or not is set to ρt > 4, however looking at the code the...