MS-AMP
MS-AMP copied to clipboard
[Feature] Auto scaling factor tuning for FP8 collective communication
Description Support for auto scaling factor tuning #41 Related Example: https://github.com/Azure/MS-AMP-Examples/pull/21
Performance (model: GPT-345M, https://github.com/Azure/MS-AMP-Examples/blob/main/gpt3/pretrain_345m_megatron.sh):
-
msamp w/o auto scaling validation loss at iteration 5000 | lm loss value: 3.531525E+00 | lm loss PPL: 3.417605E+01 | samples per second: 519.524 | TFLOPs: 155.99 |
-
msamp w/ auto scaling (Add the argument
--wgrad-auto-scaling): validation loss at iteration 5000 | lm loss value: 3.529646E+00 | lm loss PPL: 3.411188E+01 | samples per second: 516.702 | TFLOPs: 155.14 |
Major Revision
- Add a new variable
pre_scalein ScalingMeta -
pre_scalesupport inArithmetic.add_to_fp8 - Auto scaling factor tuning in megatron FP8DistributedOptimizer
- unittests