ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Shardformer FP8 communication training accuracy degradation

Open GuangyaoZhang opened this issue 1 year ago • 0 comments

Is there an existing issue for this bug?

  • [X] I have searched the existing issues

🐛 Describe the bug

TP+Split Gather(Acc) 4GPU Original FP16 Model: 0.755 FP8 Communication: 0.737

Environment

No response

GuangyaoZhang avatar Jul 18 '24 07:07 GuangyaoZhang