MS-AMP
MS-AMP copied to clipboard
Does MS-AMP support FP8 all-gather?
What's the issue, what's expected?:
How to reproduce it?:
Log message or shapshot?:
Additional information:
Hi @zigzagcai , thanks for your attention to our work!
The FP8 tensor with a scaling factor is stored in a uint8 tensor and a FP32 scalar. Therefore, the FP8 all-gather is the same as the uint8 and FP32 all-gather.
Related Implementation
DeepSpeed with MS-AMP
https://github.com/Azure/MS-AMP/blob/main/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py#L793-L801
Megatron with MS-AMP
https://github.com/Azure/MS-AMP/blob/main/msamp/megatron/optimizer/distrib_optimizer.py#L869