skorch icon indicating copy to clipboard operation
skorch copied to clipboard

Method to set random state for all components

Open ottonemo opened this issue 7 years ago • 11 comments

We need a method (possibly on the wrapper class) to initialize the random state for all components that are concerned with sampling. These include

  • the model (e.g. weight init, dropout)
  • DataLoader (batch shuffling)
  • GridSearchCV split

ottonemo avatar Aug 07 '17 09:08 ottonemo

We don't want a method for setting random states for everything, rather we want to enable setting the random state everywhere where it is needed.

ottonemo avatar Oct 02 '17 11:10 ottonemo

For all pytorch related things setting the seed using torch.manual_seed suffices. We are settled there I think. Are all sklearn random states exposed to the outside?

ottonemo avatar Dec 08 '17 10:12 ottonemo

How can I set deterministically the seed for skorch? With the same Neural Network model I obtain 2 different results on 2 different computers despite I handled all the random_seeds. So my conclusion is that maybe the problem lies in skorch..

amefabris avatar May 02 '19 15:05 amefabris

In the abstract, it's quite hard to say what the reason could, so if you have a minimal code example to reproduce the behavior, that would be great. In general, you need to think of the following sources of randomness:

  • torch.manual_seed
  • torch.cuda.manual_seed
  • numpy.seed

If you use anything from sklearn or pandas, don't forget to fix random_state. If you use the internal CV split, try without it (by passing train_split=None).

BenjaminBossan avatar May 02 '19 17:05 BenjaminBossan

It may be time to add a random_state keyword to NeuralNet

thomasjpfan avatar May 03 '19 21:05 thomasjpfan

What would you use it for? I only see CVSplit at the moment.

BenjaminBossan avatar May 04 '19 09:05 BenjaminBossan

To set the random seed for torch.manual_seed, torch.cuda.manual_seed, numpy.seed, and CVSplit, all at once?

thomasjpfan avatar May 04 '19 14:05 thomasjpfan

No, I don't think that it should be skorch's job to set seeds for torch and numpy. I could see a helper function that does it, but otherwise I would leave that to the user. Also, what would skorch do if a numpy.random.RandomState is passed?

What needs some rework is the fact that the random_state cannot be easily passed to CVSplit. Maybe we could change CVSplit to be uninitialized, so that users can have NeuralNet(..., train_split__cv=7, train_split__stratified=False, train_split__random_state=42).

BenjaminBossan avatar May 04 '19 17:05 BenjaminBossan

No, I don't think that it should be skorch's job to set seeds for torch and numpy. I could see a helper function that does it, but otherwise I would leave that to the user.

Scikit-learn classifiers that has random state allows for a random_state keyword in __init__. For example, the MLPClassifer has a randon_state, which is used to initialize hidden layers.

Also, what would skorch do if a numpy.random.RandomState is passed?

This is the blocker.

Maybe we could change CVSplit to be uninitialized

This would fix the issue for CVSplit.

For reference, there was a previous discussion about this topic here: https://github.com/skorch-dev/skorch/issues/280

thomasjpfan avatar May 05 '19 23:05 thomasjpfan

To resolve this issue, is the goal to have a function like skorch.utils.set_random_state(pytorch_random_seed, numpy_random_state, ...) and call this before doing anything?

thomasjpfan avatar May 06 '19 14:05 thomasjpfan

Scikit-learn classifiers that has random state allows for a random_state keyword in __init__

We could do this. At the moment, I only see CVSplit as a potential target for this, though, and as long as CVSplit is passed in initialized form, having the random_state init parameter would not help. On the other hand, if CVSplit is uninitialized and it's possible to pass train_split__random_state, having random_state on NeuralNet would be useless.

To resolve this issue, is the goal to have a function like skorch.utils.set_random_state(pytorch_random_seed, numpy_random_state, ...) and call this before doing anything?

I guess it wouldn't hurt to have such a function. Ideally, for me at least, I would like to call it like this: set_random_seed(0), instead of set_random_seed(0, 0, 0).

BenjaminBossan avatar May 06 '19 21:05 BenjaminBossan