training_policies
training_policies copied to clipboard
Transformer Quality Target Change
Note to follow up about the current transformer quality target (25->27?).
SWG Notes:
We intend to move to the quality target to 27. There is an AI to modify (and confirm) the reference reaches the target.
SWG Notes:
AI(Cray) - Check target quality on english to french and english to german. Related to: https://github.com/mlperf/policies/issues/175
SWG Notes:
(English to german) Published accuracy is 28.4; not able to hit 27 at the reference batch size yet; continuing parameter searching here. We expect reference to hit 27, but with changes to learning rate / batch size.
(English to german) Google believes 27 can be hit at ~64k tokens global batch size. Above this, haven't been able to converge; but still exploring. Roughly doubles # of epochs versus 25.
(English to french) published accuracy is 43... Google has seen around 41, but on going investigation.
Continuing Cray AI. AI(Google) Explore english to french at scale (non-reference).
SWG Notes:
We feel that variance is a concern here, especially at a target of 27. We'd like to increase accuracy, but want more information on variance to set the target.
AI(Cray & Google & CISCO) -- Do a some runs to 26 to look at variance (and provide data for 25.5 too).
I was able to get 8x transformer reference runs in and saw convergence to 26.0 on Eng-to-Germ within 5 epochs for 5/8 runs, and within 6 epochs for remaining 3.
Here is the relevant grep from the logs:
grep "Bleu score (uncased)" mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_*new/translation/logfile | grep ": 26" mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_0_new/translation/logfile:Bleu score (uncased): 26.452380418777466 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_1_new/translation/logfile:Bleu score (uncased): 26.39443278312683 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_2_new/translation/logfile:Bleu score (uncased): 26.0280579328537 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_3_new/translation/logfile:Bleu score (uncased): 26.264476776123047 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_4_new/translation/logfile:Bleu score (uncased): 26.29130184650421 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_5_new/translation/logfile:Bleu score (uncased): 26.16676688194275 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_6_new/translation/logfile:Bleu score (uncased): 26.01703405380249 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_7_new/translation/logfile:Bleu score (uncased): 26.256629824638367
mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_0_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_1_new/translation/logfile:Starting iteration 6 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_2_new/translation/logfile:Starting iteration 6 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_3_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_4_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_5_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_6_new/translation/logfile:Starting iteration 5 mlperf_translation_fp32_run_np1_bleu26_eng_to_germ_7_new/translation/logfile:Starting iteration 6
SWG Notes:
No change to target accuracy for v0.6. We think for v0.7 we can move to target quality of 27 given more time to work on the issue.
Active, moving to backlog.