DFL-CNN icon indicating copy to clipboard operation
DFL-CNN copied to clipboard

Overfitting problem

Open pengshiqi opened this issue 7 years ago • 12 comments

Hello,

I cloned your repo and downloaded your dataset, but I could not get the same result as yours.

I train the model on 4 GeForce GTX 1080 Ti GPUs, and keep other arguments as the same. But it turns out to be overfitting.

After 100 epochs, I got prec@1 at 99% and prec@5 at 100% on the train set, but only prec@1 at 48% and prec@5 at 73% on the test set.

Here is part of the log:

DFL-CNN <==> Train <==> Epoch: [113][103/107]
Loss 0.5459 (0.5661)	Loss1 0.0756 (0.0739)	Loss2 0.0004 (0.0289)	Loss3 4.6985 (4.6324)
Prec@1 (99.880)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][104/107]
Loss 0.5188 (0.5656)	Loss1 0.0508 (0.0737)	Loss2 0.0076 (0.0287)	Loss3 4.6041 (4.6321)
Prec@1 (99.881)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][105/107]
Loss 0.5771 (0.5657)	Loss1 0.0774 (0.0737)	Loss2 0.0189 (0.0286)	Loss3 4.8082 (4.6338)
Prec@1 (99.882)	Prec@5 (100.000)
DFL-CNN <==> Train <==> Epoch: [113][106/107]
Loss 0.8110 (0.5672)	Loss1 0.3277 (0.0753)	Loss2 0.0100 (0.0285)	Loss3 4.7324 (4.6344)
Prec@1 (99.883)	Prec@5 (100.000)
DFL-CNN <==> Test <==> Epoch: [ 106] Top1:48.050% Top5:73.093%
DFL-CNN <==> Test <==> Epoch: [ 108] Top1:48.913% Top5:73.404% 
DFL-CNN <==> Test <==> Epoch: [ 110] Top1:47.515% Top5:72.575%
DFL-CNN <==> Test <==> Epoch: [ 112] Top1:48.205% Top5:72.765%

During training the model, overfitting problem is inevitable. Because there are only about 6000 images for training, but there are too many parameters in VGG16.

Have you ever met the overfitting problem? And how did you get rid of it?

Looking forward to your reply!

Thank you very much!

pengshiqi avatar Nov 19 '18 05:11 pengshiqi

hello,I get the same problem,I haven't saw the "Test Epoch"appeared worse ,it still training...now. I set learning-rate', default=0.001,How about you?Do you have any better proposals? DFL-CNN <==> Train Epoch: [201][1159/1494] Loss 1.6247 (1.4218) Loss1 1.6245 (1.4152) Loss2 0.0000 (0.0000) Loss3 0.0023 (0.0649) Top1 100.000 (100.000) Top5 100.000 (100.000) DFL-CNN <==> Train Epoch: [201][1160/1494] Loss 1.5835 (1.4219) Loss1 1.5780 (1.4154) Loss2 0.0000 (0.0000) Loss3 0.0543 (0.0649) Top1 100.000 (100.000) Top5 100.000 (100.000)

fxle avatar Nov 20 '18 08:11 fxle

@fxle Test log is saved in DFL_CNN/log/log_text.txt .

pengshiqi avatar Nov 20 '18 09:11 pengshiqi

@pengshiqi Oh,thank you very much! I find it .It seems to be improving.But the Loss2 value looks a little strange.In addition,Do you think the idea of 'filter bank' in this paper can improve rotation invariance at the same time? DFL-CNN <==> Test <==> Epoch: [ 198] Top1:76.338% Top5:91.284% DFL-CNN <==> Test <==> Epoch: [ 200] Top1:75.854% Top5:91.042% DFL-CNN <==> Test <==> Epoch: [ 202] Top1:75.837% Top5:91.111%

fxle avatar Nov 20 '18 09:11 fxle

@fxle

DFL-CNN <==> Train Epoch: [323][2/125]
Loss 3.3836 (3.4424)	Loss1 3.1195 (3.1667)	Loss2 0.0326 (0.0284)	Loss3 2.3149 (2.4724)
Top1 100.000 (100.000)	Top5 100.000 (100.000)
DFL-CNN <==> Train Epoch: [323][3/125]
Loss 3.4611 (3.4471)	Loss1 3.2075 (3.1769)	Loss2 0.0126 (0.0245)	Loss3 2.4100 (2.4568)
Top1 100.000 (100.000)	Top5 100.000 (100.000)
DFL-CNN <==> Test <==> Epoch: [ 318] Top1:72.679% Top5:90.991%
DFL-CNN <==> Test <==> Epoch: [ 320] Top1:72.592% Top5:91.094%

Sure, the loss2 is much lower than the other losses. But I don't think that loss2 is strange. It is loss1 and loss3 that look strange. After hundreds of epochs training, they are still very large, which seems abnormal.

I think a potential cause for this problem is that the 1x1 convolutional layer is not initialized randomly, as described in Section 3.3, which has not been implemented in this code.

pengshiqi avatar Nov 21 '18 05:11 pengshiqi

@pengshiqi Yes, you are right.It's not initialized .Do you know how to make it? I have some ideas to communicate with you.I suggest we add a qq.my qq numbers are :260730636

fxle avatar Nov 21 '18 06:11 fxle

@pengshiqi @fxle I changed the model using a dropout layer. During training the model, I got loss2 decrease, loss 1 & 3 is basically not reduced. do you got same situation?

techzhou avatar Nov 25 '18 05:11 techzhou

@techzhou No,I didn't use a dropout layer.Maybe you can try to use regularization or Section 3.3 Layer Initialization to make it perform better,

fxle avatar Nov 25 '18 11:11 fxle

@techzhou Hi, how about the accuracy after u implement dropout layer? did it increase a little bit? I just think dropout may help. thanks!

Ien001 avatar Dec 10 '18 07:12 Ien001

@pengshiqi hi, how do you solve the overfitting problem?

chaerlo avatar Apr 15 '19 03:04 chaerlo

@pengshiqi hi, how do you solve the overfitting problem? @XIELeo @fxle @techzhou @Ien001 @

wsqat avatar Sep 04 '19 01:09 wsqat

@pengshiqi hi, how do you solve the overfitting problem? @XIELeo @fxle @techzhou @Ien001 @

@wsqat The default hyper-parameters are imperfect. You can adjust the learning rate and the loss weights to obtain better results.

pengshiqi avatar Sep 04 '19 02:09 pengshiqi

@pengshiqi Hi, I trained by adjusting the learning rate but ciuld only hit 52% accuracy. Can you share the weights of the model where you got 72% accuracy. Or can you mention any hyper parameters or code changes which could help me get better accuracy?

aparnaambarapu avatar Sep 21 '20 11:09 aparnaambarapu