RAdam Cannot reproduce the PPL on One Billion Words

Cannot reproduce the PPL on One Billion Words

Open XuezheMax opened this issue 4 years ago • 1 comments

For the experiments of language model (LM) on One Billion Words, the final test PPL with Adam and RAdam are around 41 and 40, respectively, worse than the numbers reported in the paper (36.9 for Adam and 35.7 for RAdam). Github version: 5716b3e91d0e264322c31823a4a8c0a4f230da27

Aug 12 '20 17:08 XuezheMax

Thanks for reaching out! I'll look into this next week.

Aug 12 '20 22:08 LiyuanLucasLiu

RAdam RAdam copied to clipboard

Cannot reproduce the PPL on One Billion Words

RAdam
RAdam copied to clipboard