DFL-CNN Overfitting problem

Hello,

I cloned your repo and downloaded your dataset, but I could not get the same result as yours.

I train the model on 4 GeForce GTX 1080 Ti GPUs, and keep other arguments as the same. But it turns out to be overfitting.

After 100 epochs, I got prec@1 at 99% and prec@5 at 100% on the train set, but only prec@1 at 48% and prec@5 at 73% on the test set.

Here is part of the log:

DFL-CNN <==> Train <==> Epoch: [113][103/107]
Loss 0.5459 (0.5661)	Loss1 0.0756 (0.0739)	Loss2 0.0004 (0.0289)	Loss3 4.6985 (4.6324)
Prec@1 (99.880)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][104/107]
Loss 0.5188 (0.5656)	Loss1 0.0508 (0.0737)	Loss2 0.0076 (0.0287)	Loss3 4.6041 (4.6321)
Prec@1 (99.881)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][105/107]
Loss 0.5771 (0.5657)	Loss1 0.0774 (0.0737)	Loss2 0.0189 (0.0286)	Loss3 4.8082 (4.6338)
Prec@1 (99.882)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][106/107]
Loss 0.8110 (0.5672)	Loss1 0.3277 (0.0753)	Loss2 0.0100 (0.0285)	Loss3 4.7324 (4.6344)
Prec@1 (99.883)	Prec@5 (100.000)

DFL-CNN <==> Test <==> Epoch: [ 106] Top1:48.050% Top5:73.093%
DFL-CNN <==> Test <==> Epoch: [ 108] Top1:48.913% Top5:73.404% 
DFL-CNN <==> Test <==> Epoch: [ 110] Top1:47.515% Top5:72.575%
DFL-CNN <==> Test <==> Epoch: [ 112] Top1:48.205% Top5:72.765%

During training the model, overfitting problem is inevitable. Because there are only about 6000 images for training, but there are too many parameters in VGG16.

Have you ever met the overfitting problem? And how did you get rid of it?

Looking forward to your reply!

Thank you very much!

Nov 19 '18 05:11 pengshiqi

hello,I get the same problem,I haven't saw the "Test Epoch"ａｐｐｅａｒｅｄ worse ,it still training...now. I set learning-rate', default=0.001,How about you?Do you have any better ｐｒｏｐｏｓａｌｓ? DFL-CNN <==> Train Epoch: [201][1159/1494] Loss 1.6247 (1.4218) Loss1 1.6245 (1.4152) Loss2 0.0000 (0.0000) Loss3 0.0023 (0.0649) Top1 100.000 (100.000) Top5 100.000 (100.000) DFL-CNN <==> Train Epoch: [201][1160/1494] Loss 1.5835 (1.4219) Loss1 1.5780 (1.4154) Loss2 0.0000 (0.0000) Loss3 0.0543 (0.0649) Top1 100.000 (100.000) Top5 100.000 (100.000)

Nov 20 '18 08:11 fxle

@fxle Test log is saved in DFL_CNN/log/log_text.txt .

Nov 20 '18 09:11 pengshiqi

@pengshiqi Oh,thank you very much! I find it .It seems to be improving.Bｕｔ the Loss2 ｖａｌｕｅ looks a little strange.In addition,Do you think the idea of 'filter bank' in this paper can improve rotation invariance at the same time? DFL-CNN <==> Test <==> Epoch: [ 198] Top1:76.338% Top5:91.284% DFL-CNN <==> Test <==> Epoch: [ 200] Top1:75.854% Top5:91.042% DFL-CNN <==> Test <==> Epoch: [ 202] Top1:75.837% Top5:91.111%

Nov 20 '18 09:11 fxle

@fxle

DFL-CNN <==> Train Epoch: [323][2/125]
Loss 3.3836 (3.4424)	Loss1 3.1195 (3.1667)	Loss2 0.0326 (0.0284)	Loss3 2.3149 (2.4724)
Top1 100.000 (100.000)	Top5 100.000 (100.000)
DFL-CNN <==> Train Epoch: [323][3/125]
Loss 3.4611 (3.4471)	Loss1 3.2075 (3.1769)	Loss2 0.0126 (0.0245)	Loss3 2.4100 (2.4568)
Top1 100.000 (100.000)	Top5 100.000 (100.000)

DFL-CNN <==> Test <==> Epoch: [ 318] Top1:72.679% Top5:90.991%
DFL-CNN <==> Test <==> Epoch: [ 320] Top1:72.592% Top5:91.094%

Sure, the loss2 is much lower than the other losses. But I don't think that loss2 is strange. It is loss1 and loss3 that look strange. After hundreds of epochs training, they are still very large, which seems abnormal.

I think a potential cause for this problem is that the 1x1 convolutional layer is not initialized randomly, as described in Section 3.3, which has not been implemented in this code.

Nov 21 '18 05:11 pengshiqi

@pengshiqi Yes, you are right.It's not initialized .Do you know how to make it? I have some ideas to communicate with you.I suggest we add a qq.my qq numbers are :260730636

Nov 21 '18 06:11 fxle

@pengshiqi @fxle I changed the model using a dropout layer. During training the model, I got loss2 decrease, loss 1 & 3 is basically not reduced. do you got same situation？

Nov 25 '18 05:11 techzhou

@techzhou No，I didn't use a dropout layer.Maybe you can try to use regularization or Section 3.3 Layer Initialization to make it perform better,

Nov 25 '18 11:11 fxle

@techzhou Hi, how about the accuracy after u implement dropout layer? did it increase a little bit? I just think dropout may help. thanks!

Dec 10 '18 07:12 Ien001

@pengshiqi hi, how do you solve the overfitting problem?

Apr 15 '19 03:04 chaerlo

@pengshiqi hi, how do you solve the overfitting problem? @XIELeo @fxle @techzhou @Ien001 @

Sep 04 '19 01:09 wsqat

@pengshiqi hi, how do you solve the overfitting problem? @XIELeo @fxle @techzhou @Ien001 @

@wsqat The default hyper-parameters are imperfect. You can adjust the learning rate and the loss weights to obtain better results.

Sep 04 '19 02:09 pengshiqi

@pengshiqi Hi, I trained by adjusting the learning rate but ciuld only hit 52% accuracy. Can you share the weights of the model where you got 72% accuracy. Or can you mention any hyper parameters or code changes which could help me get better accuracy?

Sep 21 '20 11:09 aparnaambarapu