RFBNet icon indicating copy to clipboard operation
RFBNet copied to clipboard

VOC and COCO results reproduction problem

Open shwoo93 opened this issue 6 years ago • 39 comments

Hello, first of all, thanks for your code releasing.

Since I want to see whether the reported accuracy is reproducible or not, I trained exactly the same code on the git. However, even I tried several times, your reported accuracy of voc2007(80.5%) and COCO(29.9%) is not attainable. I got 79.9% and 28.8% respectively.

For the fair comparison, I trained SSD using same training scheme as RFBNet and I obtained 78.8%.

Any comments will be appreciated.

Thanks.

shwoo93 avatar Mar 23 '18 03:03 shwoo93

@shwoo93 Would you please offer the lr and batchsize? In my experiments, larger batchsize would get higher accuracy. The VOC2007(80.5%) is obtained by lr=0.004, batch_size = 32, and I also tried several times, the performance is around 80.4%. For COCO, 29.9% is attained by lr=0.002, batch_size=32. And a new result is 30.3% with batch_size = 64. Here is the weight For original SSD, I have not reproduced the result with my RFB training scheme.

GOATmessi7 avatar Mar 23 '18 05:03 GOATmessi7

I trained both using lr=0.004, batch_size = 32.

shwoo93 avatar Mar 23 '18 06:03 shwoo93

@shwoo93 What is your max epoches? The above results are obtained around epoch 220.

GOATmessi7 avatar Mar 23 '18 06:03 GOATmessi7

@shwoo93 Which version of ssd's code did you use to achieve 78.8%? Thanks

Ranchentx avatar Jun 27 '18 02:06 Ranchentx

@shwoo93 I also reproduced the result of original SSD with the RFB training scheme and got 79.04%, so I think it is fair to compare this result with the RFB result.

vaesl avatar Jul 03 '18 11:07 vaesl

with lr=0.004, batch_size = 32, I also get VOC2007 79.9% mAP around epoch 240.

pikerbright avatar Aug 02 '18 03:08 pikerbright

There was an error in the original RFB_Net_vgg, as #23 mentioned. And now I fixed that issue and got a new higher performance in VOC (80.7%mAP) with just one trial. The COCO results are still under training.

GOATmessi7 avatar Aug 05 '18 09:08 GOATmessi7

@ruinmessi, I update code and get VOC about 80.2% mAP at epoch 260.

pikerbright avatar Aug 09 '18 02:08 pikerbright

@pikerbright do you test the released weight above? I got that result (80.72%) at epoch 250 and test it on pytorch 0.3.1, but when I switch to another machine with pytorch 0.4.0, the results is 80.56%. And I can't figure out what happened......

GOATmessi7 avatar Aug 09 '18 04:08 GOATmessi7

@ruinmessi with pytorch 0.4.1, I got the same result 80.56% for the released weight. Did you train the model on pytorch 0.3.1 or 0.4.0? I saw your updates for supporting pytorch 0.4. Does these update codes cause this differences?

pikerbright avatar Aug 09 '18 05:08 pikerbright

@pikerbright I trained the model on pytorch 0.3.1 with an old version of this repository. But the update codes only fix the incompatibility problem for pytorch 0.4, not a big difference actually.

GOATmessi7 avatar Aug 09 '18 06:08 GOATmessi7

@pikerbright can you provide the setting that you used for training, e.g. lr and epoch etc

isalirezag avatar Aug 14 '18 03:08 isalirezag

@isalirezag I used the default settings in training code lr=0.004, batch_size = 32.

pikerbright avatar Aug 16 '18 04:08 pikerbright

@zhch-sun Here is the released warm-up code. The default settings could achieve about 80.5% at around 220-250 epoch.

GOATmessi7 avatar Aug 16 '18 06:08 GOATmessi7

Sorry I didn't observe the adjust_learning_rate function~ @ruinmessi

zhch-sun avatar Aug 16 '18 06:08 zhch-sun

@ruinmessi Question 1: https://github.com/ruinmessi/RFBNet/blob/1b4d33a050e4dc11d38b959333a43fb2919b49f7/layers/functions/prior_box.py#L13 I notice that there are multiple versions of priorbox layer. Is the layer in the repo the layer you use to train all the models? Question 2: Are the models trained with pytorch 0.3.1 instead of 0.4.0? Question 3: have you used coco pretrained weights for voc07 detection? Thank you very much for this wonderful repo.

zhch-sun avatar Aug 16 '18 08:08 zhch-sun

@zhch-sun Hi, for Q1, the multiple versions of priorbox layer comes from the original SSD, and this code is mainly based on the ssd.pytorch repository. I just use the default version of priorbox. Q2: the released weight is trained with 0.3.1. But I have retrained with 0.4.0 once and get 80.3%, so I think the training code has no problem to reproduce the results. Q3: no coco-pretrained weight was used.

GOATmessi7 avatar Aug 16 '18 09:08 GOATmessi7

@ruinmessi thank you for the reply. I'm rerunning it on pytorch 0.3.1 and 0.4.0. Hope I can reproduce the result.

zhch-sun avatar Aug 16 '18 10:08 zhch-sun

@ruinmessi I get 80.2% with 0.3.1 at 240 epoch (with correct RFB_Net_vgg), I train with 0.4.0 for three times and got 79.9% and 79.8% 80.1% at roughly 250 epoch. I guess you use anaconda to manage python packages. Can you export your conda environment to an .yml file for reproduction?

zhch-sun avatar Aug 17 '18 06:08 zhch-sun

@zhch-sun Hi, Have you found the gap between two version ?

lxtGH avatar Sep 29 '18 05:09 lxtGH

@ruinmessi @ruinmessi Hi I final got 79.9% at epoch 260. So how to reproduce your performance about 80.5%,

lxtGH avatar Sep 30 '18 15:09 lxtGH

@lxtGH Did you use the latest code in this repository with correct RFB_Net_vgg? And I suggest you can save the model every 5 epoch after 200 epoch to test the result.

GOATmessi7 avatar Oct 01 '18 00:10 GOATmessi7

@ruinmessi Yes, I use the latest code and train the model use pytorch-0.4.1, gpu 2, batch_size 64, other are default settings. I test about 10 different model after 220 epoch the best score is 79.9%

lxtGH avatar Oct 01 '18 02:10 lxtGH

@lxtGH I guess the reason is that the default setting is set for batch_size=32, so you may double the lr to 0.008 or 0.007 for batch_size=64. If it still not work, I will check the gap between two versions.

GOATmessi7 avatar Oct 01 '18 03:10 GOATmessi7

@lxtGH @ruinmessi I still cannot reproduce the 80.5 mAP. Several reproduction experiments lead to approximately 80.1 mAP with default settings.

zhch-sun avatar Oct 01 '18 18:10 zhch-sun

@ruinmessi @zhch-sun Me too, I also got the 80.1 mAP

lxtGH avatar Oct 03 '18 04:10 lxtGH

@lxtGH @zhch-sun Well, I will check this gap if I have enough GPU resources.

GOATmessi7 avatar Oct 03 '18 05:10 GOATmessi7

@ruinmessi I only get 79.7 mAP on VOC 07 using pytorch 0.4.0, batch size 32, lr 4e-3

jiangzhengkai avatar Oct 03 '18 06:10 jiangzhengkai

@ruinmessi I only get 79.7 mAP on VOC 07 using pytorch 0.4.0, batch size 32, lr 4e-3

请问最后的回归损失和分类损失是多少?

chenchch94 avatar Oct 19 '18 11:10 chenchch94

@ruinmessi I only get 79.7 mAP on VOC 07 using pytorch 0.4.0, batch size 32, lr 4e-3

请问最后的回归损失和分类损失是多少?

我目前只测试了一次,batch_size=64 ngpu=2, 没有改学习率,得到的200:295:5 epoches的输出结果是:

Mean AP = 0.7974
Mean AP = 0.7968
Mean AP = 0.7971
Mean AP = 0.7966
Mean AP = 0.7970
Mean AP = 0.7961
Mean AP = 0.7966
Mean AP = 0.7967
Mean AP = 0.7965
Mean AP = 0.7961
Mean AP = 0.7963
Mean AP = 0.7968
Mean AP = 0.7958
Mean AP = 0.7968
Mean AP = 0.7964
Mean AP = 0.7967
Mean AP = 0.7965
Mean AP = 0.7964
Mean AP = 0.7963
Mean AP = 0.7965

训练最后阶段的loss值是:

Epoch:300 || epochiter: 108/258|| Totel iter 77250 || L: 0.6354 C: 1.4200||Batch time: 0.8735 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 118/258|| Totel iter 77260 || L: 0.6126 C: 1.5056||Batch time: 0.8881 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 128/258|| Totel iter 77270 || L: 0.5095 C: 1.2193||Batch time: 0.8780 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 138/258|| Totel iter 77280 || L: 0.4797 C: 1.2762||Batch time: 0.8709 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 148/258|| Totel iter 77290 || L: 0.4841 C: 1.2326||Batch time: 0.8841 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 158/258|| Totel iter 77300 || L: 0.5112 C: 1.3652||Batch time: 0.8654 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 168/258|| Totel iter 77310 || L: 0.5362 C: 1.3759||Batch time: 0.8661 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 178/258|| Totel iter 77320 || L: 0.5678 C: 1.4102||Batch time: 0.8806 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 188/258|| Totel iter 77330 || L: 0.5967 C: 1.4693||Batch time: 1.7749 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 198/258|| Totel iter 77340 || L: 0.5119 C: 1.3382||Batch time: 2.2249 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 208/258|| Totel iter 77350 || L: 0.6191 C: 1.5897||Batch time: 0.8651 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 218/258|| Totel iter 77360 || L: 0.4290 C: 1.2205||Batch time: 2.2073 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 228/258|| Totel iter 77370 || L: 0.5191 C: 1.3052||Batch time: 0.8719 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 238/258|| Totel iter 77380 || L: 0.5910 C: 1.5734||Batch time: 1.3068 sec. ||LR: 0.00000400
Epoch:300 || epochiter: 248/258|| Totel iter 77390 || L: 0.5188 C: 1.2901||Batch time: 0.8779 sec. ||LR: 0.00000400

imyhxy avatar Oct 22 '18 05:10 imyhxy