examples
examples copied to clipboard
What accuracy should we expect when training Alexnet from scratch on ImageNet?
📚 Documentation
The README https://github.com/pytorch/examples/blob/main/imagenet/README.md is very helpful when getting started with training AlexNet.
We are able to successfully train AlexNet to approximately 56% top-1 and 79% top-5 accuracy on the validation set. But this is still a fair bit below Krizhevsky's published results of circa 83% or 85% top-5 accuracy on these training sets.
We are training with the default recommendations for a single GPU in the README for AlexNet:
python main.py -a alexnet --lr 0.01 --gpu 0 /data/datasets/imagenet/
What out-of the box accuracy should we expect when training AlexNet on ImageNet with the default PyTorch implementation?
What sort of hyperparameter changes do you recommend to duplicate Alex Krizhevsky's accuracies?
Just quoting from this blog article:
The model uses a stochastic gradient descent optimization function with batch size, momentum, and weight decay set to 128, 0.9, and 0.0005 respectively. All the layers use an equal learning rate of 0.001.
Maybe try those hyperparameters, and if they lead to the expected accuracy, perhaps create a pull request to update the README file accordingly?
So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans
So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans
I came across TorchDrift https://torchdrift.org/
(It is found under PyTorch ecosystem)
It sounds like a tool that can help ensure our models accuracy specs
Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.
Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.
In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.
MobileNet might have its own hyperparameters, but the remaining models should be the same .
Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.
In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.
MobileNet might have its own hyperparameters, but the remaining models should be the same .
Thanks for the response! It's a good thing that one setting can work well for different models.
If you check most vision CNN papers you will find they train with the same hyperparameters: SGD optimizer, 90 epochs, initial learning rate 0.1 that decreases by a tenth every 30 epochs.