imagenet-multiGPU.torch icon indicating copy to clipboard operation
imagenet-multiGPU.torch copied to clipboard

low accuracy on alexnetowtbn

Open mrastegari opened this issue 9 years ago • 12 comments

I trained the Alexnet model with batch normalization (alexnetowtbn) with 4 GPU and batchSize 256. after 50 epochs my top-1 acuracy is %45 . I couldn't find any result of alexnet trained with batchnormalization. Is this number ok? It seems much lower than 57% which is reported in caffe.

mrastegari avatar Feb 22 '16 01:02 mrastegari

I trained one some time ago, the model is here https://gist.github.com/szagoruyko/dd032c529048492630fc, achieves 56.7% top1.

szagoruyko avatar Feb 22 '16 01:02 szagoruyko

This model is different with one in this repository (alexnowtbn does not have nn.Concat and number of filters in convolutinal layers are different with your model) do you think we should expect this gap?

mrastegari avatar Feb 22 '16 01:02 mrastegari

@mrastegari no that shouldnt be the issue. my bet would be the recent bugs in DPT, you might want to update everything and try again. btw you can increase the learning rate and half the number of epochs.

szagoruyko avatar Feb 22 '16 16:02 szagoruyko

I updated all the libraries (cunn, nn, cudnn, cutorch) but yet I can not get the top-1 accuracy more than 45%.

mrastegari avatar Feb 22 '16 20:02 mrastegari

yes I'm also getting a similar issue, alexnetowtbn is giving me low accuracy, trying to train with -netType alexnet to see if at least alexnet gives good performances...

cxy7452 avatar Feb 23 '16 00:02 cxy7452

I remember around two months ago I could get top-1(val) accuracy around 52% . So maybe something changed in some of the updates in the libraries.

mrastegari avatar Feb 23 '16 01:02 mrastegari

hmm, tried it again and now alexnetowtbn converges fine, got to 38th epoch and the top-1 validation accuracy is at 53.93%.

cxy7452 avatar Feb 24 '16 18:02 cxy7452

Have you followed the learningReate regime exactly in the same way as in the code? I noticed some instability in training. For example, after one epoch if I stop and then call the retrain option it gives better accuracy than just let the code goes to the next epoch. Have you reinstall any of the libraries?

mrastegari avatar Feb 24 '16 19:02 mrastegari

hmm, I've updated torch, nn, cutorch, and cudnn. But my version of imagenet-multiGPU was from a few months ago, I've just cloned the new version and just began a training of alexnetowbn to see if I can duplicate the results.

cxy7452 avatar Feb 24 '16 19:02 cxy7452

alexnetowbn trained and converged fine, btw.

cxy7452 avatar Mar 12 '16 15:03 cxy7452

Thanks for the effort !!!

mrastegari avatar Mar 12 '16 19:03 mrastegari

Hey Guys, do you have any updated results ? I trained Alexnet (without batch normalization), and I get top-1 accuracy of 54.93 on val set.

Viresh-R avatar Jul 17 '16 19:07 Viresh-R