RAdam-Tensorflow
RAdam-Tensorflow copied to clipboard
Difference in SMA threshold between code and paper
In the algorithm outlined in the original paper, the threshold for whether adapted momentum is applied or not is set to ρt > 4, however looking at the code the threshold used is 5.0
https://github.com/taki0112/RAdam-Tensorflow/blob/29328c3ddf07b62585c29fb1bc1b8ebf33a71c8b/RAdam.py#L99
Is there any reason for this?