Swin-Transformer The model can not converge

trafficstars

I train the swin_tiny_patch4_window7_224 with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.

Jul 19 '21 06:07 guozhiyao

I found the the average grad-norm of my model is about 0.7, which is much smaller than your situation, and make the update of model parameters very slow and can not converge. Do you know how to fix it?

Jul 19 '21 12:07 guozhiyao

Emmm, I also find the problem!!! The loss cannot go down along with the training process. Can we have a dicuss using QQ? My account is 2667004002.

Jul 28 '21 09:07 Starboy-at-earth

I train the swin_tiny_patch4_window7_224 with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.

You may check the same code on a dataset of smaller scale, to fix potential bugs.

Aug 12 '21 08:08 ancientmooner

fine....I also find this problem....The loss cannot go down even my lr is 1e-7...I do not know how to solve this case...I replace resnet with swin-s in my nets as a new backbone.But my loss can not go down.

Aug 17 '21 02:08 JackjackFan

My model can converge now. I train the model with softmax loss, and setting warm up iters and batch size large can converge normally.

Aug 17 '21 10:08 guozhiyao

what is the "warm up iters ", I cannot find it in the config. By the way I have the same problem, the loss cannot go down as below:

[2021-12-05 14:30:04 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][80/1895] eta 0:42:20 lr 0.000000194 time 1.4120 (1.3996) loss 9.5960 (9.6270) grad_norm 4.6469 (5.3799) mem 17084MB [2021-12-05 14:30:18 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][90/1895] eta 0:42:04 lr 0.000000211 time 1.3961 (1.3988) loss 9.5745 (9.6272) grad_norm 5.0378 (5.4116) mem 17084MB [2021-12-05 14:30:32 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][100/1895] eta 0:41:48 lr 0.000000227 time 1.3841 (1.3972) loss 9.6274 (9.6305) grad_norm 5.9137 (5.6181) mem 17084MB [2021-12-05 14:30:46 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][110/1895] eta 0:41:41 lr 0.000000244 time 1.3985 (1.4014) loss 9.5727 (9.6291) grad_norm 5.7255 (5.6733) mem 17084MB

@guozhiyao my batch_size=64, my dataset is 14000 class.

Dec 05 '21 06:12 hdmjdp

Thanks @guozhiyao

Dec 20 '21 10:12 ancientmooner

@hdmjdp How did you solve this issue?

Apr 04 '22 16:04 ruiyan1995

Mine cannot go down too, for cifar10 with my own framework.

Feb 10 '24 05:02 BitCalSaul

@guozhiyao Hey， I'm wondering what the point it is from the grad_norm. I have seen some people use this metric with their issue about convergence of Swin. Would you please give a hint, thanks.

Feb 10 '24 05:02 BitCalSaul

Swin-Transformer Swin-Transformer copied to clipboard

The model can not converge

Swin-Transformer
Swin-Transformer copied to clipboard