benchmarks icon indicating copy to clipboard operation
benchmarks copied to clipboard

The accuracy of training resnet50 on the Imagenet2012 dataset is lower than 75%

Open Sampson1107 opened this issue 6 years ago • 10 comments

Base on the research of 'Kaiming He https://github.com/KaimingHe/deep-residual-networks', the accuracy of training resnet50 on the Imagenet2012 dataset should be around 75%. The alexnet model should be around 58% described in the paper (https://arxiv.org/pdf/1708.03888.pdf) . We can only get the accuracy around 71% for resnet50 and 54% for alexnet when we train the models based on this benchmark. The models and hyper-parameters are set as same as the paper "Deep Residual Learning for Image Recognition" (http://arxiv.org/abs/1512.03385). The hardware is same with the paper too.

Can tensorflow reach to the same accuracy base on the same settings? Is there any method to get the baseline accuracy? Look forward to your reply!

Sampson1107 avatar May 22 '18 09:05 Sampson1107

Based on the more recent research https://arxiv.org/abs/1706.02677 a resnet50 should get 76.4, which can be reached by tensorflow using the code here.

ppwwyyxx avatar May 22 '18 16:05 ppwwyyxx

/CC @bignamehyp, who verified convergence but probably cannot respond for a few weeks.

The README has a command to run Resnet50 on 8 GPUs to get to ~76% accuracy. Try running that command.

reedwm avatar May 22 '18 16:05 reedwm

@reedwm @ppwwyyxx Thanks a lot! I have run the Resnet50 model to get ~75.9% accuracy, but I want to get the accuracy base on the research (Kaiming He https://github.com/KaimingHe/deep-residual-networks) which didn't use the methods of warm-up and fp16.

what about alexnet? Is there alexnet model's baseline code (http://arxiv.org/abs/1512.03385)?

Look forward to your reply!

Sampson1107 avatar May 23 '18 02:05 Sampson1107

The warm-up is only for mixed precision FP16 for loss scaling. Only ResNet50 is well tested in the benchmark code, this code is not intended to be reference implementations.

If you read the torch implementation that is linked from your link you will find they got 24% error or 76% top_1 which also differs from the paper. They made a guess as to why.
http://torch.ch/blog/2016/02/04/resnets.html For Resnet101 they had significant difference as well. The even thank Kaiming for helping them understand any ambiguity in the paper.

If you are looking for models that are meant to be copies of the papers I suggest here:

https://github.com/tensorflow/models/tree/master/research/slim Slim may not have great throughput but they should get accuracy closer to the papers, although it is very possible it also out performs the paper. They also have published checkpoints. Those models were created by researcher's at Google. Time has passed and the models are not tested on a regular basis so anything is possible.

We also have the official models repo with models that are supported and tested ~daily: https://github.com/tensorflow/models/tree/master/official

tf_cnn_benchmarks ResNet50v1 has trained well with FP32 as well. I do not have the command line but someone else likely does. It is possible that it will not match the paper as that was not a goal. It can be really difficult to match the paper exactly. Not an excuse just a reality.

I hope this helps.

tfboyd avatar May 23 '18 02:05 tfboyd

seems ResNet50 is well supported and tested. How about alexnet? I'm trying very hard to train alexnet in order to get close to the published 58%+ top1, but can only got 53%+. I also cannot find a run config in either benchmark nor slim. Can any one share the command which can reach a reasonable and good result?

SeaOfOcean avatar May 23 '18 07:05 SeaOfOcean

I don't think we have tested running Alexnet to convergence, unfortunately. @tfboyd, correct me if I'm wrong.

reedwm avatar May 23 '18 20:05 reedwm

@SeaOfOcean

I lost part of my last comment and should have added. The other models in tf_cnn_bench are representative of the models (likely some issues here and there) but they are unlikely to train to the paper. My other links are to models that have been proven to train but may not train as fast. I have wanted to purge the models we do not test, but they are useful for our perf testing even if not perfect. For official results we do not publish benchmarks for models without accuracy. Before you check tensorflow.org and see I did publish inception3 and VGG16 numbers; I did that over a year ago before I grasped the nuance and that will not be done going forward. That data has still be very useful for people but I have raised my standard and no misdirection was intended.

I shared a bit more than you wanted but I thought the information might help. I also typed this quickly so I apologize if it is a bit sloppy.

tfboyd avatar May 23 '18 20:05 tfboyd

@tfboyd @reedwm thanks for your explanation and clarification. In order to train benchmark with similar accuracy as that in paper, I have fixed the model, data preprocess, some hyperparameters(e.g. learning rate schedule, weight decay, etc). What else should I consider when using benchmark framework in your experience?

SeaOfOcean avatar May 24 '18 11:05 SeaOfOcean

@SeaOfOcean This tensorflow script can train alexnet to 58% top-1 validation accuracy.

ppwwyyxx avatar May 31 '18 00:05 ppwwyyxx

@ppwwyyxx We do not tell you enough, but you really are great and appreciated by all of us. Just quick random thank you. Side note: If there is more we can do to support you please let me know, e.g. latest benchmark numbers, current best approaches, or whatever. I cannot promise anything but if I have info that would help you I would like to share it.

tfboyd avatar May 31 '18 03:05 tfboyd