Overfitting problem
Hello,
I cloned your repo and downloaded your dataset, but I could not get the same result as yours.
I train the model on 4 GeForce GTX 1080 Ti GPUs, and keep other arguments as the same. But it turns out to be overfitting.
After 100 epochs, I got prec@1 at 99% and prec@5 at 100% on the train set, but only prec@1 at 48% and prec@5 at 73% on the test set.
Here is part of the log:
DFL-CNN <==> Train <==> Epoch: [113][103/107]
Loss 0.5459 (0.5661) Loss1 0.0756 (0.0739) Loss2 0.0004 (0.0289) Loss3 4.6985 (4.6324)
Prec@1 (99.880) Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][104/107]
Loss 0.5188 (0.5656) Loss1 0.0508 (0.0737) Loss2 0.0076 (0.0287) Loss3 4.6041 (4.6321)
Prec@1 (99.881) Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][105/107]
Loss 0.5771 (0.5657) Loss1 0.0774 (0.0737) Loss2 0.0189 (0.0286) Loss3 4.8082 (4.6338)
Prec@1 (99.882) Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][106/107]
Loss 0.8110 (0.5672) Loss1 0.3277 (0.0753) Loss2 0.0100 (0.0285) Loss3 4.7324 (4.6344)
Prec@1 (99.883) Prec@5 (100.000)
DFL-CNN <==> Test <==> Epoch: [ 106] Top1:48.050% Top5:73.093%
DFL-CNN <==> Test <==> Epoch: [ 108] Top1:48.913% Top5:73.404%
DFL-CNN <==> Test <==> Epoch: [ 110] Top1:47.515% Top5:72.575%
DFL-CNN <==> Test <==> Epoch: [ 112] Top1:48.205% Top5:72.765%
During training the model, overfitting problem is inevitable. Because there are only about 6000 images for training, but there are too many parameters in VGG16.
Have you ever met the overfitting problem? And how did you get rid of it?
Looking forward to your reply!
Thank you very much!
hello,I get the same problem,I haven't saw the "Test Epoch"appeared worse ,it still training...now. I set learning-rate', default=0.001,How about you?Do you have any better proposals? DFL-CNN <==> Train Epoch: [201][1159/1494] Loss 1.6247 (1.4218) Loss1 1.6245 (1.4152) Loss2 0.0000 (0.0000) Loss3 0.0023 (0.0649) Top1 100.000 (100.000) Top5 100.000 (100.000) DFL-CNN <==> Train Epoch: [201][1160/1494] Loss 1.5835 (1.4219) Loss1 1.5780 (1.4154) Loss2 0.0000 (0.0000) Loss3 0.0543 (0.0649) Top1 100.000 (100.000) Top5 100.000 (100.000)
@fxle Test log is saved in DFL_CNN/log/log_text.txt .
@pengshiqi Oh,thank you very much! I find it .It seems to be improving.But the Loss2 value looks a little strange.In addition,Do you think the idea of 'filter bank' in this paper can improve rotation invariance at the same time? DFL-CNN <==> Test <==> Epoch: [ 198] Top1:76.338% Top5:91.284% DFL-CNN <==> Test <==> Epoch: [ 200] Top1:75.854% Top5:91.042% DFL-CNN <==> Test <==> Epoch: [ 202] Top1:75.837% Top5:91.111%
@fxle
DFL-CNN <==> Train Epoch: [323][2/125]
Loss 3.3836 (3.4424) Loss1 3.1195 (3.1667) Loss2 0.0326 (0.0284) Loss3 2.3149 (2.4724)
Top1 100.000 (100.000) Top5 100.000 (100.000)
DFL-CNN <==> Train Epoch: [323][3/125]
Loss 3.4611 (3.4471) Loss1 3.2075 (3.1769) Loss2 0.0126 (0.0245) Loss3 2.4100 (2.4568)
Top1 100.000 (100.000) Top5 100.000 (100.000)
DFL-CNN <==> Test <==> Epoch: [ 318] Top1:72.679% Top5:90.991%
DFL-CNN <==> Test <==> Epoch: [ 320] Top1:72.592% Top5:91.094%
Sure, the loss2 is much lower than the other losses. But I don't think that loss2 is strange. It is loss1 and loss3 that look strange. After hundreds of epochs training, they are still very large, which seems abnormal.
I think a potential cause for this problem is that the 1x1 convolutional layer is not initialized randomly, as described in Section 3.3, which has not been implemented in this code.
@pengshiqi Yes, you are right.It's not initialized .Do you know how to make it? I have some ideas to communicate with you.I suggest we add a qq.my qq numbers are :260730636
@pengshiqi @fxle I changed the model using a dropout layer. During training the model, I got loss2 decrease, loss 1 & 3 is basically not reduced. do you got same situation?
@techzhou No,I didn't use a dropout layer.Maybe you can try to use regularization or Section 3.3 Layer Initialization to make it perform better,
@techzhou Hi, how about the accuracy after u implement dropout layer? did it increase a little bit? I just think dropout may help. thanks!
@pengshiqi hi, how do you solve the overfitting problem?
@pengshiqi hi, how do you solve the overfitting problem? @XIELeo @fxle @techzhou @Ien001 @
@pengshiqi hi, how do you solve the overfitting problem? @XIELeo @fxle @techzhou @Ien001 @
@wsqat The default hyper-parameters are imperfect. You can adjust the learning rate and the loss weights to obtain better results.
@pengshiqi Hi, I trained by adjusting the learning rate but ciuld only hit 52% accuracy. Can you share the weights of the model where you got 72% accuracy. Or can you mention any hyper parameters or code changes which could help me get better accuracy?