MS-AMP icon indicating copy to clipboard operation
MS-AMP copied to clipboard

[Feature] Auto scaling factor tuning for FP8 collective communication

Open wkcn opened this issue 2 years ago • 0 comments

Description Support for auto scaling factor tuning #41 Related Example: https://github.com/Azure/MS-AMP-Examples/pull/21

Performance (model: GPT-345M, https://github.com/Azure/MS-AMP-Examples/blob/main/gpt3/pretrain_345m_megatron.sh):

  • msamp w/o auto scaling validation loss at iteration 5000 | lm loss value: 3.531525E+00 | lm loss PPL: 3.417605E+01 | samples per second: 519.524 | TFLOPs: 155.99 |

  • msamp w/ auto scaling (Add the argument --wgrad-auto-scaling): validation loss at iteration 5000 | lm loss value: 3.529646E+00 | lm loss PPL: 3.411188E+01 | samples per second: 516.702 | TFLOPs: 155.14 |

Major Revision

  • Add a new variable pre_scale in ScalingMeta
  • pre_scale support in Arithmetic.add_to_fp8
  • Auto scaling factor tuning in megatron FP8DistributedOptimizer
  • unittests

wkcn avatar Dec 07 '23 03:12 wkcn