`GradScaler` and mixed-precision training

Open lostmsu opened this issue 2 years ago • 2 comments

https://pytorch.org/docs/stable/notes/amp_examples.html

Currently, bfloat16 works well without grad scaling. But to use fp16 and fp8 (fp8 - in the future, when the support for Hopper/40XX GPUs lands) one needs to scale gradients.

Jan 30 '23 22:01 lostmsu

This would be an awesome contribution from someone who knows how to do this right.

Jan 31 '23 17:01 NiklasGustafsson