sd-scripts Model diverges using deepspeed fp16 mixed-precision training

My model diverges when using deepspeed with fp16 mixed-precision training. I've tried using both AdamW8bit and AdamW optimizers, but neither of them have worked for me. On the other hand, deepspeed with bf16 mixed-precision training does work, but training in bf16 can lead to image artifacts, especially when using DPM++ samplers. As a result, I've chosen to use fp16 instead.

Environment: Ubuntu 20.04.6 LTS deepspeed version: 0.14.2

Screenshot from 2024-05-17 10-02-58 Here's the loss graph. The purple line represents the fp16 training, while the blue line shows the bf16 training.

May 17 '24 02:05 ngitnenlim

@BootsofLagrangian Do you have any idea what might be causing this problem?

May 17 '24 02:05 ngitnenlim

@BootsofLagrangian Do you have any idea what might be causing this problem?

Interesting features. DeepSpeed upcasts precision to operate for optimizers. It might be one of the reason, but I'm not sure.

In addition, I will investigate this issue. Thank you for your report.

May 17 '24 11:05 BootsofLagrangian

Maybe using bf16 is just fine. Image artifacts mean you still need more training steps as the model is learning something new. On the other hand, fp16 with diverged loss may indicate the model has learned nothing.

May 18 '24 12:05 tristanwqy

@jihnenglin I saw loss divergence under some unknown conditions. But I still can not found the reason why model divergence

Jun 20 '24 09:06 BootsofLagrangian