attention-is-all-you-need-pytorch icon indicating copy to clipboard operation
attention-is-all-you-need-pytorch copied to clipboard

Performance with default parameters looks completely off...

Open JianBingJuanDaCong opened this issue 4 years ago • 1 comments

I am following exactly every step, and the firs couple of epochs look like:

Header pof m30k_deen_shr.train.log: epoch,loss,ppl,accuracy 0, 9.15762, 9486.43847,0.014 1, 9.14988, 9413.27772,4.100 2, 9.13908, 9312.21408,10.980 3, 9.12995, 9227.57595,11.952 4, 9.12218, 9156.17200,12.014 5, 9.11489, 9089.63892,12.015 6, 9.10745, 9022.23094,12.015 7, 9.09957, 8951.48423,12.019 8, 9.09133, 8877.95168,13.178 9, 9.08266, 8801.31062,16.358

Which is tootally off compared with the README plot. Is there a customized parameter I should use to replicate the author's performance number?

JianBingJuanDaCong avatar Apr 13 '20 06:04 JianBingJuanDaCong

I believe the default parameter means batch size 2048, warmup 4000, and learning rate 2.0. In my case, I set batch size 128, warmup 128000, and learning rate 8.0. It took about 120 epochs to achieve the reported results (ppl ~ 10.5 and accuracy ~ 58.5% on the validation set). Hope this helps.

SimiaoZuo avatar May 06 '20 15:05 SimiaoZuo