TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

Does TransformerEngine support FP8 communication such like all-gather or all-to-all?

Open zigzagcai opened this issue 8 months ago • 4 comments

In MoE model architectures, especially when the model size is quite large. We found the throughput is limited by communication (all-gather / reduce-scatter / all-to-all). Where all-gather and reduce-scatter mainly used in ZeRO3 or FSDP, all-to-all mainly used in expert parallelism. The communication is quite large, and finally becomes the bottleneck.

We found another FP8 library named torchao has FP8 all-gather communication enabled. But I cannot find the similar FP8 communication API provided in TE. So, does TransformerEngine support FP8 communication suck like all-gather/reduce-scatter or all-to-all?

zigzagcai avatar Mar 14 '25 10:03 zigzagcai

I think fp8 all-gather should be already supported in TE (_all_gather_fp8).

BestJuly avatar Mar 17 '25 10:03 BestJuly

It depends on the type of communication. For FP8 with delayed scaling:

timmoon10 avatar Mar 17 '25 18:03 timmoon10

It depends on the type of communication. For FP8 with delayed scaling:

Thank you! @timmoon10 @BestJuly

Just another question, does TE has plans to support FP8 all-to-all like what DeepEP has done?

zigzagcai avatar Apr 01 '25 03:04 zigzagcai

Just another question, does TE has plans to support FP8 all-to-all like what DeepEP has done?

TE will provide necessary APIs and the final integration of DeepEP with FP8 will be in Megatron. BTW, we have already integrated DeepEP with BF16 in Megatron.

yaox12 avatar Apr 08 '25 05:04 yaox12