ColossalAI
ColossalAI copied to clipboard
[CUDA] FP8 all-reduce using all-to-all and all-gather