pytorch-cifar The test set is being used as validation set

The test set is being used as validation set

Open seuretm opened this issue 3 years ago • 6 comments

The network is evaluated on the test set at every epoch, and whenever the result is higher, the network is saved (some kind of early stopping). This is what a validation set should be used for (as CIFAR-10 does not contain a validation set, a subset of the training data can be used for this). The goal of the test set is to know how well a network performs on unseen data; however in this case, the test set is used for optimizing the network's results.

The test set must be used only once, at the end of the training. This training procedure is erroneous, and therefore the reported results are unfortunately all invalid.

Jan 24 '22 10:01 seuretm

Agreed. Even more concerning, many papers now are reporting their performance using the best results on the test set.

Feb 14 '22 16:02 zjysteven

But the net is set to eval mode before being tested, while it is set to train mode before training, using: net.train() and net.eval()

Mar 16 '22 11:03 wihn2021

It is not about batchnorm statistics... It is just that evaluating on the test set to select the best model (e.g., best checkpoint and hyperparameters) goes against the basic practice/assumption of machine learning and is not realistic. In real world, there is no way to obtain the expected test samples before the model is deployed.

Mar 16 '22 13:03 zjysteven

🤔right.✌️

Mar 16 '22 13:03 wihn2021

Yes. It is a big issue here though. The test set is been used as the validation set. That means the models trained in the framework memorize the data patterns from the test set and train set. Overall, it causes overfitting.

Mar 24 '22 20:03 melhzy

您好：您的邮件我已收到，我会尽快回复。刘洪宇

Oct 11 '22 06:10 yolunghiu

pytorch-cifar pytorch-cifar copied to clipboard

The test set is being used as validation set

pytorch-cifar
pytorch-cifar copied to clipboard