pytorch-cifar
pytorch-cifar copied to clipboard
The test set is being used as validation set
The network is evaluated on the test set at every epoch, and whenever the result is higher, the network is saved (some kind of early stopping). This is what a validation set should be used for (as CIFAR-10 does not contain a validation set, a subset of the training data can be used for this). The goal of the test set is to know how well a network performs on unseen data; however in this case, the test set is used for optimizing the network's results.
The test set must be used only once, at the end of the training. This training procedure is erroneous, and therefore the reported results are unfortunately all invalid.
Agreed. Even more concerning, many papers now are reporting their performance using the best results on the test set.
But the net is set to eval mode before being tested, while it is set to train mode before training, using: net.train() and net.eval()
It is not about batchnorm statistics... It is just that evaluating on the test set to select the best model (e.g., best checkpoint and hyperparameters) goes against the basic practice/assumption of machine learning and is not realistic. In real world, there is no way to obtain the expected test samples before the model is deployed.
🤔right.✌️
Yes. It is a big issue here though. The test set is been used as the validation set. That means the models trained in the framework memorize the data patterns from the test set and train set. Overall, it causes overfitting.
您好: 您的邮件我已收到,我会尽快回复。 刘洪宇