RAdam-Tensorflow Difference in SMA threshold between code and paper

Difference in SMA threshold between code and paper

Open joeforan76 opened this issue 5 years ago • 0 comments

In the algorithm outlined in the original paper, the threshold for whether adapted momentum is applied or not is set to ρt > 4, however looking at the code the threshold used is 5.0

https://github.com/taki0112/RAdam-Tensorflow/blob/29328c3ddf07b62585c29fb1bc1b8ebf33a71c8b/RAdam.py#L99

Is there any reason for this?

Sep 06 '19 07:09 joeforan76

RAdam-Tensorflow RAdam-Tensorflow copied to clipboard

Difference in SMA threshold between code and paper

RAdam-Tensorflow
RAdam-Tensorflow copied to clipboard