mxnet-ssd icon indicating copy to clipboard operation
mxnet-ssd copied to clipboard

Evaluation results on several models are lower than reported ones.

Open ghost opened this issue 7 years ago • 26 comments

I simply downloaded the released ssd models pretrained on voc07+12. The models are evaluated with the latest mxnet ssd examples. The evaluation results are:

VGG16_reduced 300x300 75.7(evaluated) 77.8(reported) VGG16_reduced 512x512 78.6(evaluated) 79.9(reported) Inception-v3 512x512 77.1(evaluated) 78.9(reported) Resnet-50 512x512 77.8(evaluated) 78.9(reported)

Evaluation is performed with symbols legacy_vgg16_ssd_300, legacy_vgg16_ssd_512, inceptionv3 and resnet50. There are 1~2 mAP drops.

I also tried reproducing the ssd_vgg_reduced_300 model and got 73.1~73.7 mAP after several experiments.

Then I tried mxnet-ssd on several other datasets. It seems the accuracy is 3~5 percents lower than those reproduced in caffe-ssd (original version from weiliu88). Are there any details I missed?

ghost avatar Oct 31 '17 03:10 ghost

@zhreshold

ghost avatar Nov 02 '17 01:11 ghost

@xioryu Do you have time writing out the results to files and use official Matlab code to verify the results?

zhreshold avatar Nov 02 '17 01:11 zhreshold

Adding to this list, the new mobileNet512x512 gives 67.5 mAP

davsol avatar Nov 04 '17 16:11 davsol

@davsol Not really. The mAP of MobileNet 512x512 is 72.5 on VOC07 test. MobileNet 608x608 gives 74.57. I evaluated this last week.

jyuan1118 avatar Nov 07 '17 06:11 jyuan1118

You are right, I discovered a bug in my code and when I fixed it, the results were the same as reported

On Nov 7, 2017 8:51 AM, "jyuan" [email protected] wrote:

@davsol https://github.com/davsol Not really. The mAP of MobileNet 512x512 is 72.5 on VOC07 test. MobileNet 608x608 gives 74.57. I evaluated this last week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zhreshold/mxnet-ssd/issues/131#issuecomment-342391774, or mute the thread https://github.com/notifications/unsubscribe-auth/ALzmed9IMgCXBXxDZwAlBp-_h_gODkk2ks5sz_35gaJpZM4QMJtS .

davsol avatar Nov 07 '17 07:11 davsol

@zhreshold , I have almost the same results with @xioryu Inception-v3 512x512 77.0(evaluated) 78.9(reported) Resnet-50 512x512 77.8(evaluated) 78.9(reported) I reproduced the ssd_vgg_reduced_300 model and got 73.4 mAP

Could you @davsol tell me your mAP of MobileNet 512x512?

cc-cn avatar Nov 07 '17 10:11 cc-cn

@davsol Could you tell me which kind of bug in the code?

ghost avatar Nov 07 '17 11:11 ghost

I automated the scales of the anchors, and my calculations were wrong. This resulted with a model that learned on certain scales, and was evaluated on others. Working on a new generic function now :)

On Nov 7, 2017 1:20 PM, "Xiaoyu Tao" [email protected] wrote:

@davsol https://github.com/davsol Could you tell me which kind of bug in the code?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zhreshold/mxnet-ssd/issues/131#issuecomment-342452559, or mute the thread https://github.com/notifications/unsubscribe-auth/ALzmeVEkCYWIk9pNTnIe8-y9-GWP20QHks5s0DzggaJpZM4QMJtS .

davsol avatar Nov 07 '17 12:11 davsol

Hi @xioryu , I also cannot reproduce the reported accuracy.

I doubt if the reported results are reliable since I noticed that the reported accuracy is posted before the evaluation metric fixed ( issuse #97 ). Do you think this the reason why we cannot reproduce the results?

Note: The VGG's results are posted before Apr 4, 2017 The evaluation metric is fixed on Jun 29, 2017

cypw avatar Nov 09 '17 11:11 cypw

I am going to retrain some of the models to be consistent with recent updates. Don't worry.

zhreshold avatar Nov 09 '17 19:11 zhreshold

Guys, I've uploaded a new resnet50, see if it works for you. Next one is inceptionv3. BTW, the updated vgg model is still not good enough. I will revisit back to it later.

zhreshold avatar Nov 13 '17 20:11 zhreshold

The new resnet50 works. I got 79.01 mAP (similar as reported)

ghost avatar Nov 14 '17 02:11 ghost

@zhreshold How did you get this? Could you please share your configuration?

BunnyShan avatar Nov 16 '17 02:11 BunnyShan

@BunnyShan It's basically the standard arguments.

python train.py --gpus 0,1,2,3 --network resnet50 --lr 0.004 --freeze '' --data-shape 512

zhreshold avatar Nov 16 '17 08:11 zhreshold

@zhreshold Thanks, I will try it!

BunnyShan avatar Nov 16 '17 14:11 BunnyShan

@zhreshold I tried your config on resnet50 512 shape and got 79.5%mAP! But I can't reproduce 77.1% on resnet50 321 shape as reported in the paper, I also tried predict_module in DSSD, still could not reproduce the result. Any advice?

BunnyShan avatar Nov 26 '17 10:11 BunnyShan

@BunnyShan I have some idea to tune the configs for models with smaller input size, will need to verify that.

zhreshold avatar Nov 29 '17 04:11 zhreshold

@zhreshold I tried resnet101 that was used in the origin paper and got very poor result, need to notice that the default resnet101 layer config is different from origin paper but both cannot reproduce the result. I thought maybe the high level feature is not suitable for detection and I'm diving in the caffe implementation for some results. And I'm looking forward to your result on this issue :)

BunnyShan avatar Dec 10 '17 10:12 BunnyShan

@BunnyShan Fixed https://github.com/zhreshold/mxnet-ssd/commit/e7cc662f3d0ce29e10670f60a8cfcce854036780

zhreshold avatar Dec 16 '17 07:12 zhreshold

@zhreshold Thanks a lot! Any progress on small input size?

BunnyShan avatar Dec 20 '17 06:12 BunnyShan

@xioryu @zhreshold Can you kindly share the initial values used for training? I want to train the MobileNet 300 on VOC07 but the mAP is not near these numbers. Right now I am using the default values for the VGG-reduced.

yahyanik avatar Jan 19 '18 18:01 yahyanik

@zhreshold Are you going to retrain the VGG16 based SSD? We could not reproduce the precision using current training config.

siyiding1216 avatar Jan 31 '18 18:01 siyiding1216

@zhreshold I have checked the caffe ssd code , The low map may be caused by data augment. The original caffe has the following order: (1) Image distortion; (2) Image expansion; (3) Sampling a random crop window; (4) Data transform: Adding noise, resizing, The second step seems to be ignored in this mxnet version ,

nopattern avatar May 10 '18 07:05 nopattern

@nopattern mAP is no longer an issue in https://github.com/dmlc/gluon-cv I admit that data aug is quite tricky and prune to change, that's why I am not developing using C++ iterator anymore and transit to python in gluon.

zhreshold avatar May 10 '18 18:05 zhreshold

get it.

nopattern avatar May 11 '18 07:05 nopattern

@zhreshold _I have tried gluoncv version of this ssd . But the training speed is almost half slow than this version for multiprocess glouncv issue. Can you give any advice?

nopattern avatar May 20 '18 13:05 nopattern