RAdam
RAdam copied to clipboard
Cannot reproduce the PPL on One Billion Words
For the experiments of language model (LM) on One Billion Words, the final test PPL with Adam and RAdam are around 41 and 40, respectively, worse than the numbers reported in the paper (36.9 for Adam and 35.7 for RAdam). Github version: 5716b3e91d0e264322c31823a4a8c0a4f230da27
Thanks for reaching out! I'll look into this next week.