Model diverges using deepspeed fp16 mixed-precision training
My model diverges when using deepspeed with fp16 mixed-precision training. I've tried using both AdamW8bit and AdamW optimizers, but neither of them have worked for me. On the other hand, deepspeed with bf16 mixed-precision training does work, but training in bf16 can lead to image artifacts, especially when using DPM++ samplers. As a result, I've chosen to use fp16 instead.
Environment: Ubuntu 20.04.6 LTS deepspeed version: 0.14.2
Here's the loss graph. The purple line represents the fp16 training, while the blue line shows the bf16 training.
@BootsofLagrangian Do you have any idea what might be causing this problem?
@BootsofLagrangian Do you have any idea what might be causing this problem?
Interesting features. DeepSpeed upcasts precision to operate for optimizers. It might be one of the reason, but I'm not sure.
In addition, I will investigate this issue. Thank you for your report.
Maybe using bf16 is just fine. Image artifacts mean you still need more training steps as the model is learning something new. On the other hand, fp16 with diverged loss may indicate the model has learned nothing.
@jihnenglin I saw loss divergence under some unknown conditions. But I still can not found the reason why model divergence