waifu-diffusion
waifu-diffusion copied to clipboard
clip_grad_norm applied to scaled gradients
On this line, grad clipping occurs:
https://github.com/harubaru/waifu-diffusion/blob/27d301c5b96834536166cc2f12e7a9bb4079fb96/trainer/diffusers_trainer.py#L931
However, if fp16 is enabled then the clipping would be applied to the scaled gradients, due to GradScaler
.
According to PyTorch documentation (https://pytorch.org/docs/master/notes/amp_examples.html#gradient-clipping), the gradients should be unscaled before clipping.
So, this appears to be a bug and could cause fp16 training to result in worse performance than it otherwise should.