mne-python
mne-python copied to clipboard
API: RandomState deprecation, default_rng, and SPEC7
np.random.RandomState is not a good choice for random number generation going forward:
RandomState is effectively frozen and will only receive updates that are required by changes in the the internals of Numpy. More substantial changes, including algorithmic improvements, are reserved for Generator.
I don't understand all the details, but the TL;DR I've gathered from monitoring some SciPy discussions is that it's not as good as the new default_rng / Generator-based way of making random numbers and it should be replaced in people's code.
In the long term I think we want to switch random_state=0 to use default_rng(random_state) instead of RandomState(random_state) internally. I propose:
- Have a
mne.set_rng_backend(backend)that can be'RandomState'or'Generator', maybe with ause_rng_backendcontext manager. The context manager is maybe overkill, but I do think it might help people transition code from old to new, and it's only a few lines for us. - If this has not been called by the user yet, if they do
random_state=<int>they will see a DeprecationWarning stating that the default is'RandomState'in 0.23 but will change to'Generator'in 0.24 (or maybe this is a good case for making it 0.25 to give people more time to change over), and we will callset_rng_backend('RandomState'). - In 0.24 (or 0.25) we no longer emit this warning and just use
'Generator'.
I think this is a workable solution for modernizing our random_state params.
The new Generator class docstring says:
No Compatibility Guarantee
Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.
I think it's worth asking what we use RNGs for.
- reproducibility of examples / analyses Here, sticking with the frozen
RandomStateapproach is actually better than switching to the newGenerator/default_rngapproach, right? - in our tests This should be unaffected (usually for generating small fake data, the actual data values shouldn't matter)
- permutations, bootstrapping, random parcellation Here it's not clear to me whether the old/new RNG approaches will make much difference to us, but my instinct is that the results will technically change (same script before/after will have slightly different results) but in practice it shouldn't actually matter.
numerics.random_permutationThis is a utility designed to emulate MATLAB'srandpermfor sanity-check comparisions. This will probably break?mxne, ICA Here I don't think I understand well enough how the randomness is used to say what the situation is.
I think for us we want everything to always be reproducible for a given int random_state (and a given mne.set_rng_backend). I hadn't looked at Generator / default_rng, it is indeed a problem that it does not guarantee stability.
I wonder if we should just switch EDIT: via deprecation as above to PCG64, which is guaranteed to be stable:
Compatibility Guarantee
PCG64 makes a guarantee that a fixed seed and will always produce the same random integer stream.
But is better than RandomState:
By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy MT19937 used in RandomState.
agree on the plan to switch (with deprecation cycle) to explicitly use PCG64 (which happens be be the current default for np.random.default_rng, but that might change).
scipy will deprecate RandomState ?
scikit-learn has not switched yet so I don't feel the urgence
scipy will deprecate RandomState ?
There is a long ongoing discussion in the SciPy PR DOC: remove references to RandomState (#13778). The short version so far to me is that RandomState will continue to work at least for some time but should not (and will not) be the preferred way to do things in examples and parameters.
Hi everyone! I'm the cause of everyone's problems here. As you work through these issues, I do recommend reading NEP 19 which goes into more detail about why we had the old backwards-compatibility policy and why we moved away from it. The concerns you have brought up here were definitely in our thoughts. I'm happy to answer any questions that the NEP doesn't answer.
I can't speak definitively as to when/how/if scipy really deprecates RandomState per se, but I have my guess as to where we're going on the aforementioned PR. Some time after we drop support for numpy 1.16, I expect that we will transition the code to use Generator instead of RandomState. I expect that we'll still accept RandomState instances, but we will wrap its core uniform PRNG in Generator and call its methods instead. So if you call scipy.stats.norm.rvs(random_state=RandomState(1234)), you will get the results from Generator.standard_normal()'s improved algorithm rather than the older, slower RandomState.standard_normal(). The core Mersenne Twister PRNG underneath the passed RandomState will still be used, however. Again, we haven't made any decisions, yet, but I think that's where we are going.
Thanks for the info @rkern . After some reading and discussion I think our plan is to shelve this for now until we bump our minimum numpy requirement to 1.17+ and then switch everything over to using Generator syntax internally (probably still using MT19937 for backward compat), which will likely happen in the next year.
Sounds like a good plan.