mne-python icon indicating copy to clipboard operation
mne-python copied to clipboard

API: RandomState deprecation, default_rng, and SPEC7

Open larsoner opened this issue 4 years ago • 9 comments

np.random.RandomState is not a good choice for random number generation going forward:

RandomState is effectively frozen and will only receive updates that are required by changes in the the internals of Numpy. More substantial changes, including algorithmic improvements, are reserved for Generator.

I don't understand all the details, but the TL;DR I've gathered from monitoring some SciPy discussions is that it's not as good as the new default_rng / Generator-based way of making random numbers and it should be replaced in people's code.

In the long term I think we want to switch random_state=0 to use default_rng(random_state) instead of RandomState(random_state) internally. I propose:

  1. Have a mne.set_rng_backend(backend) that can be 'RandomState' or 'Generator', maybe with a use_rng_backend context manager. The context manager is maybe overkill, but I do think it might help people transition code from old to new, and it's only a few lines for us.
  2. If this has not been called by the user yet, if they do random_state=<int> they will see a DeprecationWarning stating that the default is 'RandomState' in 0.23 but will change to 'Generator' in 0.24 (or maybe this is a good case for making it 0.25 to give people more time to change over), and we will call set_rng_backend('RandomState').
  3. In 0.24 (or 0.25) we no longer emit this warning and just use 'Generator'.

I think this is a workable solution for modernizing our random_state params.

larsoner avatar Apr 01 '21 16:04 larsoner

The new Generator class docstring says:

No Compatibility Guarantee

Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.

I think it's worth asking what we use RNGs for.

  • reproducibility of examples / analyses Here, sticking with the frozen RandomState approach is actually better than switching to the new Generator / default_rng approach, right?
  • in our tests This should be unaffected (usually for generating small fake data, the actual data values shouldn't matter)
  • permutations, bootstrapping, random parcellation Here it's not clear to me whether the old/new RNG approaches will make much difference to us, but my instinct is that the results will technically change (same script before/after will have slightly different results) but in practice it shouldn't actually matter.
  • numerics.random_permutation This is a utility designed to emulate MATLAB's randperm for sanity-check comparisions. This will probably break?
  • mxne, ICA Here I don't think I understand well enough how the randomness is used to say what the situation is.

drammock avatar Apr 01 '21 16:04 drammock

I think for us we want everything to always be reproducible for a given int random_state (and a given mne.set_rng_backend). I hadn't looked at Generator / default_rng, it is indeed a problem that it does not guarantee stability.

I wonder if we should just switch EDIT: via deprecation as above to PCG64, which is guaranteed to be stable:

Compatibility Guarantee

PCG64 makes a guarantee that a fixed seed and will always produce the same random integer stream.

But is better than RandomState:

By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy MT19937 used in RandomState.

larsoner avatar Apr 01 '21 18:04 larsoner

agree on the plan to switch (with deprecation cycle) to explicitly use PCG64 (which happens be be the current default for np.random.default_rng, but that might change).

drammock avatar Apr 01 '21 18:04 drammock

scipy will deprecate RandomState ?

scikit-learn has not switched yet so I don't feel the urgence

agramfort avatar Apr 01 '21 18:04 agramfort

scipy will deprecate RandomState ?

There is a long ongoing discussion in the SciPy PR DOC: remove references to RandomState (#13778). The short version so far to me is that RandomState will continue to work at least for some time but should not (and will not) be the preferred way to do things in examples and parameters.

larsoner avatar Apr 01 '21 20:04 larsoner

Hi everyone! I'm the cause of everyone's problems here. As you work through these issues, I do recommend reading NEP 19 which goes into more detail about why we had the old backwards-compatibility policy and why we moved away from it. The concerns you have brought up here were definitely in our thoughts. I'm happy to answer any questions that the NEP doesn't answer.

I can't speak definitively as to when/how/if scipy really deprecates RandomState per se, but I have my guess as to where we're going on the aforementioned PR. Some time after we drop support for numpy 1.16, I expect that we will transition the code to use Generator instead of RandomState. I expect that we'll still accept RandomState instances, but we will wrap its core uniform PRNG in Generator and call its methods instead. So if you call scipy.stats.norm.rvs(random_state=RandomState(1234)), you will get the results from Generator.standard_normal()'s improved algorithm rather than the older, slower RandomState.standard_normal(). The core Mersenne Twister PRNG underneath the passed RandomState will still be used, however. Again, we haven't made any decisions, yet, but I think that's where we are going.

rkern avatar Apr 01 '21 23:04 rkern

Thanks for the info @rkern . After some reading and discussion I think our plan is to shelve this for now until we bump our minimum numpy requirement to 1.17+ and then switch everything over to using Generator syntax internally (probably still using MT19937 for backward compat), which will likely happen in the next year.

larsoner avatar Apr 02 '21 17:04 larsoner

Sounds like a good plan.

rkern avatar Apr 02 '21 18:04 rkern