waifu-diffusion clip_grad_norm applied to scaled gradients

clip_grad_norm applied to scaled gradients

Open fpgaminer opened this issue 2 years ago • 0 comments

On this line, grad clipping occurs:

https://github.com/harubaru/waifu-diffusion/blob/27d301c5b96834536166cc2f12e7a9bb4079fb96/trainer/diffusers_trainer.py#L931

However, if fp16 is enabled then the clipping would be applied to the scaled gradients, due to GradScaler.

According to PyTorch documentation (https://pytorch.org/docs/master/notes/amp_examples.html#gradient-clipping), the gradients should be unscaled before clipping.

So, this appears to be a bug and could cause fp16 training to result in worse performance than it otherwise should.

Jan 03 '23 23:01 fpgaminer

waifu-diffusion waifu-diffusion copied to clipboard

clip_grad_norm applied to scaled gradients

waifu-diffusion
waifu-diffusion copied to clipboard