Swin-Transformer icon indicating copy to clipboard operation
Swin-Transformer copied to clipboard

The model can not converge

Open guozhiyao opened this issue 4 years ago • 11 comments
trafficstars

I train the swin_tiny_patch4_window7_224 with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.

guozhiyao avatar Jul 19 '21 06:07 guozhiyao

I found the the average grad-norm of my model is about 0.7, which is much smaller than your situation, and make the update of model parameters very slow and can not converge. Do you know how to fix it?

guozhiyao avatar Jul 19 '21 12:07 guozhiyao

Emmm, I also find the problem!!! The loss cannot go down along with the training process. Can we have a dicuss using QQ? My account is 2667004002.

Starboy-at-earth avatar Jul 28 '21 09:07 Starboy-at-earth

I train the swin_tiny_patch4_window7_224 with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.

You may check the same code on a dataset of smaller scale, to fix potential bugs.

ancientmooner avatar Aug 12 '21 08:08 ancientmooner

fine....I also find this problem....The loss cannot go down even my lr is 1e-7...I do not know how to solve this case...I replace resnet with swin-s in my nets as a new backbone.But my loss can not go down.

JackjackFan avatar Aug 17 '21 02:08 JackjackFan

My model can converge now. I train the model with softmax loss, and setting warm up iters and batch size large can converge normally.

guozhiyao avatar Aug 17 '21 10:08 guozhiyao

what is the "warm up iters ", I cannot find it in the config. By the way I have the same problem, the loss cannot go down as below:

[2021-12-05 14:30:04 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][80/1895] eta 0:42:20 lr 0.000000194 time 1.4120 (1.3996) loss 9.5960 (9.6270) grad_norm 4.6469 (5.3799) mem 17084MB [2021-12-05 14:30:18 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][90/1895] eta 0:42:04 lr 0.000000211 time 1.3961 (1.3988) loss 9.5745 (9.6272) grad_norm 5.0378 (5.4116) mem 17084MB [2021-12-05 14:30:32 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][100/1895] eta 0:41:48 lr 0.000000227 time 1.3841 (1.3972) loss 9.6274 (9.6305) grad_norm 5.9137 (5.6181) mem 17084MB [2021-12-05 14:30:46 swin_base_patch4_window7_224](main.py 224): INFO Train: [0/300][110/1895] eta 0:41:41 lr 0.000000244 time 1.3985 (1.4014) loss 9.5727 (9.6291) grad_norm 5.7255 (5.6733) mem 17084MB

@guozhiyao my batch_size=64, my dataset is 14000 class.

hdmjdp avatar Dec 05 '21 06:12 hdmjdp

Thanks @guozhiyao

ancientmooner avatar Dec 20 '21 10:12 ancientmooner

@hdmjdp How did you solve this issue?

ruiyan1995 avatar Apr 04 '22 16:04 ruiyan1995

Mine cannot go down too, for cifar10 with my own framework.

BitCalSaul avatar Feb 10 '24 05:02 BitCalSaul

@guozhiyao Hey, I'm wondering what the point it is from the grad_norm. I have seen some people use this metric with their issue about convergence of Swin. Would you please give a hint, thanks.

BitCalSaul avatar Feb 10 '24 05:02 BitCalSaul