pytorch-cifar Why dont we have validation data during cifar trainig?

Why dont we have validation data during cifar trainig?

Open isalirezag opened this issue 6 years ago • 2 comments

Please correct me if im wrong, but should not we have validation dataset while we train ? I am looking at cifar examples and i noticed that we dont have validation dataset, we just have train and test. Should we skip validation? why? Can you please explain it to me

Thanks

Jun 24 '18 19:06 isalirezag

One should always use a validation set if you are tuning stuff, and also to notice overfitting. So, you can take a small set out of the training say 10k out of the 50k and use it as validation. Once you are done tuning, you can train on the whole set again. To be extra rigorous, cross-validation on multiple validation sets shuffled for every factor you tune is recommended. There has been concerns expressed about this, https://arxiv.org/abs/1806.00451.

Also, to do model selection (like selecting the best checkpoint out of multiple runs), it is common to use cross-validation but there are alternative better ways to do that, like using bayesian methods.

Jun 27 '18 13:06 amrit110

I agree with @amrit110 . If you run the training as is, it will inadvertently be tuning itself directly to the test set and achieve overly high accuracies (I was getting over 95% on ResNet 34).

Here is a sample dataloader than includes the validation. https://github.com/phelps-matthew/FeatherMap/blob/train/feathermap/data_loader.py.

Cross validation is good here, but can be rather expensive. I am now at least getting consistent accuracies using a validation set of size 10%.

Oct 11 '20 15:10 phelps-matthew

pytorch-cifar pytorch-cifar copied to clipboard

Why dont we have validation data during cifar trainig?

pytorch-cifar
pytorch-cifar copied to clipboard