[Feature] Auto scaling factor tuning for FP8 collective communication

Open wkcn opened this issue 2 years ago • 0 comments

Description Support for auto scaling factor tuning #41 Related Example: https://github.com/Azure/MS-AMP-Examples/pull/21

Performance (model: GPT-345M, https://github.com/Azure/MS-AMP-Examples/blob/main/gpt3/pretrain_345m_megatron.sh):

msamp w/o auto scaling validation loss at iteration 5000 | lm loss value: 3.531525E+00 | lm loss PPL: 3.417605E+01 | samples per second: 519.524 | TFLOPs: 155.99 |
msamp w/ auto scaling (Add the argument --wgrad-auto-scaling): validation loss at iteration 5000 | lm loss value: 3.529646E+00 | lm loss PPL: 3.411188E+01 | samples per second: 516.702 | TFLOPs: 155.14 |

Major Revision

Dec 07 '23 03:12 wkcn