DeepOBS icon indicating copy to clipboard operation
DeepOBS copied to clipboard

Evaluation set is a subset of the training set

Open MaximusMutschler opened this issue 5 years ago • 6 comments

Hello Frank, just came over the following pattern that is used in all dataset classes:

def _make_train_eval_dataset(self):
    """Creates the CIFAR-10 train eval dataset.

Returns:
  A tf.data.Dataset instance with batches of training eval data.
"""
    return self._train_dataset.take(
        self._train_eval_size // self._batch_size)

The problem is that the take method does not delete the data from the Dataset the data is taken from. As a result the evaluation set and the training set are not distinct. This should not be the case or at least, this is not the standard way.

Here a short dummy example that shows that the data is really not deleted from the train dataset:

import tensorflow as tf import numpy as np

x= np.array([1,2,3,4,5])

dataset1= tf.data.Dataset.from_tensor_slices(x) dataset2= dataset1.take(3) it1= dataset1.make_one_shot_iterator() it2 = dataset2.make_one_shot_iterator() sess= tf.Session() it1next = it1.get_next() it2next = it2.get_next() for i in range(5): print(sess.run([it1next])) for i in range(3): print(sess.run([it2next]))

result: [1] [2] [3] [4] [5]

[1] [2] [3]


To Do

We will do the following steps for version 1.2.0:

  • [x] Include a validation set for PyTorch (needs to be merged from Aaron's branch)
  • [x] Include a validation set for TensorFlow (almost ready)
  • [x] Add a graphic with the split/setup for all four data sets to the docs.

MaximusMutschler avatar Aug 30 '19 07:08 MaximusMutschler

I believe this is supposed to happen. You should by the way use Aaron's branch, where he fixed a lot of bugs and did quite a bit of refactoring, for both the pytorch and tensorflow versions: https://github.com/abahde/DeepOBS

At least over there, the optimizer is evaluated every epoch on these three distinct sets: a) the train_eval set, which is a subset of the training data b) the test set c) the validation set, which is the testset used for hyperparameter tuning.

ludwigbald avatar Sep 02 '19 14:09 ludwigbald

Thanks a lot for the hint with the branch !!

I was confused that there exists no _make_eval_dataset method. But yes, they way you describe it gives sense. And the programmers are already aware that the evaluation set is missing: DeepOBS/deepobs/tensorflow/runners/runner.py : elif phase == 'VALID': # TODO !!!!!! IS CURRENTLY USING TEST_INIT_OP. CHANGE THIS TO VALID_INIT_OP ONCE THEY ARE IMPLEMENTED !!!!!! # sess.run(tproblem.valid_init_op) sess.run(tproblem.test_init_op) msg = "VALID: WARNING: THIS IS CURRENTLY ALSO THE EVALUATION ON THE TEST DATA SET!!: "

MaximusMutschler avatar Sep 02 '19 16:09 MaximusMutschler

Sorry, I didn't see this issue until today, so hence my late response.

But Ludwig already gave the right answer. The "train eval" set is by design a subset of the "train" set. It is there to see how the network performs on the data it trained on, when used in inference mode. It is not meant as a validation set, but as an evaluation set, i.e. a more accurate measure of the performance on seen data, than just the mini-batch losses.

In the original version of DeepOBS we used the "test" set as a means to do hyperparameter optimization. This means that the final reported test score is not an accurate measure on what would happen on truly unseen data, but we also never claimed that.

But we see that it deviates from what most people do, which is why in our new version (the one developed by Aaron) we will have an additional "validation" set used for hyperparameter tuning. It will take a few more weeks until we can publish this new version. But it will have full support for Pytorch and new baselines.

In the meantime, using Aaron's branch, as Ludwig mentioned, is a good idea. It is the current stable beta and already supports PyTorch in almost all features.

Thanks for your feedback, we hope that the new version is a bit less confusing.

fsschneider avatar Sep 05 '19 14:09 fsschneider

We will do the following steps for version 1.2.0:

  • [x] Include a validation set for PyTorch (needs to be merged from Aaron's branch)
  • [x] Include a validation set for TensorFlow (almost ready)
  • [x] Add a graphic with the split/setup for all four data sets to the docs.

Added to the first post.

fsschneider avatar Sep 17 '19 12:09 fsschneider

Just wanted to give an update:

We now have a validation set for PyTorch and TensorFlow in the development branch. I also included a graphic with the setup for those four data sets in the docs for TensorFlow and PyTorch

Hopefully, this will make things clearer, thanks for raising this issue.

I marked the issue as done and will close it, once we publish version 1.2.0.

fsschneider avatar Sep 25 '19 16:09 fsschneider

Thanks a lot! Looking forward to use your suite!

MaximusMutschler avatar Sep 30 '19 07:09 MaximusMutschler