pytorch-cifar
pytorch-cifar copied to clipboard
Why dont we have validation data during cifar trainig?
Please correct me if im wrong, but should not we have validation dataset while we train ? I am looking at cifar examples and i noticed that we dont have validation dataset, we just have train and test. Should we skip validation? why? Can you please explain it to me
Thanks
One should always use a validation set if you are tuning stuff, and also to notice overfitting. So, you can take a small set out of the training say 10k out of the 50k and use it as validation. Once you are done tuning, you can train on the whole set again. To be extra rigorous, cross-validation on multiple validation sets shuffled for every factor you tune is recommended. There has been concerns expressed about this, https://arxiv.org/abs/1806.00451.
Also, to do model selection (like selecting the best checkpoint out of multiple runs), it is common to use cross-validation but there are alternative better ways to do that, like using bayesian methods.
I agree with @amrit110 . If you run the training as is, it will inadvertently be tuning itself directly to the test set and achieve overly high accuracies (I was getting over 95% on ResNet 34).
Here is a sample dataloader than includes the validation. https://github.com/phelps-matthew/FeatherMap/blob/train/feathermap/data_loader.py.
Cross validation is good here, but can be rather expensive. I am now at least getting consistent accuracies using a validation set of size 10%.