attention-is-all-you-need-pytorch
attention-is-all-you-need-pytorch copied to clipboard
Performance with default parameters looks completely off...
I am following exactly every step, and the firs couple of epochs look like:
Header pof m30k_deen_shr.train.log: epoch,loss,ppl,accuracy 0, 9.15762, 9486.43847,0.014 1, 9.14988, 9413.27772,4.100 2, 9.13908, 9312.21408,10.980 3, 9.12995, 9227.57595,11.952 4, 9.12218, 9156.17200,12.014 5, 9.11489, 9089.63892,12.015 6, 9.10745, 9022.23094,12.015 7, 9.09957, 8951.48423,12.019 8, 9.09133, 8877.95168,13.178 9, 9.08266, 8801.31062,16.358
Which is tootally off compared with the README plot. Is there a customized parameter I should use to replicate the author's performance number?
I believe the default parameter means batch size 2048, warmup 4000, and learning rate 2.0. In my case, I set batch size 128, warmup 128000, and learning rate 8.0. It took about 120 epochs to achieve the reported results (ppl ~ 10.5 and accuracy ~ 58.5% on the validation set). Hope this helps.