pytorch-cifar icon indicating copy to clipboard operation
pytorch-cifar copied to clipboard

MobileNetV2 training does not converge

Open cherry-licongyi opened this issue 3 years ago • 1 comments

I use the mobilenetv2 code in Repository training on the CIFAR10 data set, but does not converge, do not know what reason, any answer would be appreciated

here is the training log

==> Preparing data..
==> Building model..

Epoch: 0
 [=========================== 391/391 ============================>]  Step: 2s840ms | Tot: 29s605ms | Loss: 2.304 | Acc: 9.868% (4934/50000)                                                  
 [=========================== 100/100 ============================>]  Step: 39ms | Tot: 3s794ms | Loss: 2.304 | Acc: 10.000% (1000/10000)                                                     
Saving..

Epoch: 1
 [=========================== 391/391 ============================>]  Step: 52ms | Tot: 26s718ms | Loss: 2.304 | Acc: 9.674% (4837/50000)                                                     
 [=========================== 100/100 ============================>]  Step: 42ms | Tot: 3s526ms | Loss: 2.304 | Acc: 10.000% (1000/10000)                                                     

Epoch: 2
 [=========================== 391/391 ============================>]  Step: 63ms | Tot: 27s702ms | Loss: 2.304 | Acc: 9.836% (4918/50000)                                                     
 [=========================== 100/100 ============================>]  Step: 36ms | Tot: 3s642ms | Loss: 2.303 | Acc: 10.000% (1000/10000)                                                     

Epoch: 3
 [=========================== 391/391 ============================>]  Step: 53ms | Tot: 26s19ms | Loss: 2.304 | Acc: 10.154% (5077/50000)                                                     
 [=========================== 100/100 ============================>]  Step: 36ms | Tot: 3s463ms | Loss: 2.304 | Acc: 10.000% (1000/10000)                                                     

Epoch: 4
 [=========================== 391/391 ============================>]  Step: 61ms | Tot: 26s405ms | Loss: 2.304 | Acc: 10.148% (5074/50000)                                                    
 [=========================== 100/100 ============================>]  Step: 36ms | Tot: 3s632ms | Loss: 2.304 | Acc: 10.000% (1000/10000)                                                     

Epoch: 5
 [=========================== 391/391 ============================>]  Step: 52ms | Tot: 26s824ms | Loss: 2.304 | Acc: 9.966% (4983/50000)                                                     
 [=========================== 100/100 ============================>]  Step: 34ms | Tot: 3s266ms | Loss: 2.305 | Acc: 10.000% (1000/10000)                                                     

Epoch: 6
 [=========================== 391/391 ============================>]  Step: 64ms | Tot: 25s992ms | Loss: 2.304 | Acc: 10.260% (5130/50000)                                                    
 [=========================== 100/100 ============================>]  Step: 33ms | Tot: 3s639ms | Loss: 2.304 | Acc: 10.000% (1000/10000)                                                     

Epoch: 7
 [=========================== 391/391 ============================>]  Step: 58ms | Tot: 26s53ms | Loss: 2.304 | Acc: 9.936% (4968/50000)                                                      
 [=========================== 100/100 ============================>]  Step: 34ms | Tot: 3s475ms | Loss: 2.304 | Acc: 10.000% (1000/10000)                                                     

Epoch: 8
^CTraceback (most recent call last): ..............................]  Step: 71ms | Tot: 6s349ms | Loss: 2.305 | Acc: 10.205% (1267/12416)  

cherry-licongyi avatar Dec 17 '21 07:12 cherry-licongyi

Try a smaller learning rate?

logan-mo avatar Aug 06 '22 17:08 logan-mo