examples icon indicating copy to clipboard operation
examples copied to clipboard

What accuracy should we expect when training Alexnet from scratch on ImageNet?

Open yoderj opened this issue 2 years ago • 8 comments

📚 Documentation

The README https://github.com/pytorch/examples/blob/main/imagenet/README.md is very helpful when getting started with training AlexNet.

We are able to successfully train AlexNet to approximately 56% top-1 and 79% top-5 accuracy on the validation set. But this is still a fair bit below Krizhevsky's published results of circa 83% or 85% top-5 accuracy on these training sets.

We are training with the default recommendations for a single GPU in the README for AlexNet:

python main.py -a alexnet --lr 0.01 --gpu 0 /data/datasets/imagenet/

What out-of the box accuracy should we expect when training AlexNet on ImageNet with the default PyTorch implementation?

What sort of hyperparameter changes do you recommend to duplicate Alex Krizhevsky's accuracies?

yoderj avatar Apr 11 '22 20:04 yoderj

Just quoting from this blog article:

The model uses a stochastic gradient descent optimization function with batch size, momentum, and weight decay set to 128, 0.9, and 0.0005 respectively. All the layers use an equal learning rate of 0.001.

mostafaelhoushi avatar May 09 '22 15:05 mostafaelhoushi

Maybe try those hyperparameters, and if they lead to the expected accuracy, perhaps create a pull request to update the README file accordingly?

mostafaelhoushi avatar May 09 '22 15:05 mostafaelhoushi

So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans

msaroufim avatar Jul 10 '22 20:07 msaroufim

So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans

I came across TorchDrift https://torchdrift.org/
(It is found under PyTorch ecosystem)

It sounds like a tool that can help ensure our models accuracy specs

mostafaelhoushi avatar Jul 25 '22 12:07 mostafaelhoushi

Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.

wangtiance avatar Jan 11 '23 06:01 wangtiance

Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.

In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.

MobileNet might have its own hyperparameters, but the remaining models should be the same .

mostafaelhoushi avatar Jan 11 '23 16:01 mostafaelhoushi

Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.

In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.

MobileNet might have its own hyperparameters, but the remaining models should be the same .

Thanks for the response! It's a good thing that one setting can work well for different models.

wangtiance avatar Jan 12 '23 02:01 wangtiance

If you check most vision CNN papers you will find they train with the same hyperparameters: SGD optimizer, 90 epochs, initial learning rate 0.1 that decreases by a tenth every 30 epochs.

mostafaelhoushi avatar Jan 12 '23 03:01 mostafaelhoushi