Swin-Transformer After several iterations gradient norm and loss becomes nan

After several iterations gradient norm and loss becomes nan

Open neurosynapse opened this issue 3 years ago • 2 comments

Hello,

It would be nice if you could help me to solve this issue. I have been trying to train the swin transformer model (swin_base_patch4_window7_224) on imagenet dataset with 100 classes (I exchanged the mlp head from 1000 to 100 dimension output). However, after some iterations the gradient norm and loss become nan. I have tried several lr and gradient clip values but the issue persists.

Best regards, Roberts

Oct 31 '22 18:10 neurosynapse

Same issue here, except I am using all default config and hyper parameters for Swin-B from scratch on ImageNet-1k. Gradient norm explodes quickly after several epochs.

Apr 09 '23 06:04 byronyi

@neurosynapse @byronyi This issue is due to AMP using the torch.float16 dtype by default. Use torch.bfloat16 instead.

Nov 22 '23 19:11 rajeevgl01

Swin-Transformer Swin-Transformer copied to clipboard

After several iterations gradient norm and loss becomes nan

Swin-Transformer
Swin-Transformer copied to clipboard