mxnet-ssd
mxnet-ssd copied to clipboard
Evaluation results on several models are lower than reported ones.
I simply downloaded the released ssd models pretrained on voc07+12. The models are evaluated with the latest mxnet ssd examples. The evaluation results are:
VGG16_reduced 300x300 75.7(evaluated) 77.8(reported) VGG16_reduced 512x512 78.6(evaluated) 79.9(reported) Inception-v3 512x512 77.1(evaluated) 78.9(reported) Resnet-50 512x512 77.8(evaluated) 78.9(reported)
Evaluation is performed with symbols legacy_vgg16_ssd_300, legacy_vgg16_ssd_512, inceptionv3 and resnet50. There are 1~2 mAP drops.
I also tried reproducing the ssd_vgg_reduced_300 model and got 73.1~73.7 mAP after several experiments.
Then I tried mxnet-ssd on several other datasets. It seems the accuracy is 3~5 percents lower than those reproduced in caffe-ssd (original version from weiliu88). Are there any details I missed?
@zhreshold
@xioryu Do you have time writing out the results to files and use official Matlab code to verify the results?
Adding to this list, the new mobileNet512x512 gives 67.5 mAP
@davsol Not really. The mAP of MobileNet 512x512 is 72.5 on VOC07 test. MobileNet 608x608 gives 74.57. I evaluated this last week.
You are right, I discovered a bug in my code and when I fixed it, the results were the same as reported
On Nov 7, 2017 8:51 AM, "jyuan" [email protected] wrote:
@davsol https://github.com/davsol Not really. The mAP of MobileNet 512x512 is 72.5 on VOC07 test. MobileNet 608x608 gives 74.57. I evaluated this last week.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zhreshold/mxnet-ssd/issues/131#issuecomment-342391774, or mute the thread https://github.com/notifications/unsubscribe-auth/ALzmed9IMgCXBXxDZwAlBp-_h_gODkk2ks5sz_35gaJpZM4QMJtS .
@zhreshold , I have almost the same results with @xioryu Inception-v3 512x512 77.0(evaluated) 78.9(reported) Resnet-50 512x512 77.8(evaluated) 78.9(reported) I reproduced the ssd_vgg_reduced_300 model and got 73.4 mAP
Could you @davsol tell me your mAP of MobileNet 512x512?
@davsol Could you tell me which kind of bug in the code?
I automated the scales of the anchors, and my calculations were wrong. This resulted with a model that learned on certain scales, and was evaluated on others. Working on a new generic function now :)
On Nov 7, 2017 1:20 PM, "Xiaoyu Tao" [email protected] wrote:
@davsol https://github.com/davsol Could you tell me which kind of bug in the code?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zhreshold/mxnet-ssd/issues/131#issuecomment-342452559, or mute the thread https://github.com/notifications/unsubscribe-auth/ALzmeVEkCYWIk9pNTnIe8-y9-GWP20QHks5s0DzggaJpZM4QMJtS .
Hi @xioryu , I also cannot reproduce the reported accuracy.
I doubt if the reported results are reliable since I noticed that the reported accuracy is posted before the evaluation metric fixed ( issuse #97 ). Do you think this the reason why we cannot reproduce the results?
Note: The VGG's results are posted before Apr 4, 2017 The evaluation metric is fixed on Jun 29, 2017
I am going to retrain some of the models to be consistent with recent updates. Don't worry.
Guys, I've uploaded a new resnet50, see if it works for you. Next one is inceptionv3. BTW, the updated vgg model is still not good enough. I will revisit back to it later.
The new resnet50 works. I got 79.01 mAP (similar as reported)
@zhreshold How did you get this? Could you please share your configuration?
@BunnyShan It's basically the standard arguments.
python train.py --gpus 0,1,2,3 --network resnet50 --lr 0.004 --freeze '' --data-shape 512
@zhreshold Thanks, I will try it!
@zhreshold I tried your config on resnet50 512 shape and got 79.5%mAP! But I can't reproduce 77.1% on resnet50 321 shape as reported in the paper, I also tried predict_module in DSSD, still could not reproduce the result. Any advice?
@BunnyShan I have some idea to tune the configs for models with smaller input size, will need to verify that.
@zhreshold I tried resnet101 that was used in the origin paper and got very poor result, need to notice that the default resnet101 layer config is different from origin paper but both cannot reproduce the result. I thought maybe the high level feature is not suitable for detection and I'm diving in the caffe implementation for some results. And I'm looking forward to your result on this issue :)
@BunnyShan Fixed https://github.com/zhreshold/mxnet-ssd/commit/e7cc662f3d0ce29e10670f60a8cfcce854036780
@zhreshold Thanks a lot! Any progress on small input size?
@xioryu @zhreshold Can you kindly share the initial values used for training? I want to train the MobileNet 300 on VOC07 but the mAP is not near these numbers. Right now I am using the default values for the VGG-reduced.
@zhreshold Are you going to retrain the VGG16 based SSD? We could not reproduce the precision using current training config.
@zhreshold I have checked the caffe ssd code , The low map may be caused by data augment. The original caffe has the following order: (1) Image distortion; (2) Image expansion; (3) Sampling a random crop window; (4) Data transform: Adding noise, resizing, The second step seems to be ignored in this mxnet version ,
@nopattern mAP is no longer an issue in https://github.com/dmlc/gluon-cv I admit that data aug is quite tricky and prune to change, that's why I am not developing using C++ iterator anymore and transit to python in gluon.
get it.
@zhreshold _I have tried gluoncv version of this ssd . But the training speed is almost half slow than this version for multiprocess glouncv issue. Can you give any advice?