Use Generators with a global prng within pymoo.
Addresses #469
This is a minimal attempt to fix the issue of pymoo changing the global seed.
- [ ] ensure all tests pass
@jacktang I made a first pass at this by creating a replacement global PRNG in the package ~but haven't been able to understand why the same seed tests fail. Perhaps still some randomness not taking the PRNG or I am overlooking something wrt managing global variables.~
Test failed due to implications of from x import y behavior that I have now changed. Nonetheless solution here doesn't seem fully robust and some tests still fail I think primarily due to hash checking tests?
@CompRhys I suggest to design pymoo own RandGenerator class and keep the state rather than using variable. The simplest implementation may looks like:
import numpy as np
class RandGenerator:
def __init__(self, seed=None):
self.rng = np.random.default_rng(seed)
def ints(self, low, high):
return self.rng.integers(low=low, high=high)
I am still looking into the random in multi-threading part.
Seems like a much bigger refactor to be instantiating new PRNGs everywhere and then potentially runs the risk of users using the same seed in everything and ending up be correlated PRNGs?
If it would just be to create a Generator class to wrap the numpy one that also seems non-ideal particularly given the example here where you've changed the API and then it would shift the onus of documentation onto pymoo?
Seems like a much bigger refactor to be instantiating new PRNGs everywhere and then potentially runs the risk of users using the same seed in everything and ending up be correlated PRNGs?
If it would just be to create a Generator class to wrap the numpy one that also seems non-ideal particularly given the example here where you've changed the API and then it would shift the onus of documentation onto pymoo?
Nice catch! @CompRhys. We can add seed function to the RandGenerator to set the global seed, the ugly implementation like
import numpy as np
_PYMOO_SEED = None
class RandGenerator:
def __init__(self, seed=None):
seed = seed or _PYMOO_SEED
self.rng = np.random.default_rng(seed)
def integers(self, low, high):
return self.rng.integers(low=low, high=high)
def normal(self, mu, sigma):
return self.rng.normal(loc=mu, scale=sigma)
@classmethod
def seed(cls, seed):
global _PYMOO_SEED
_PYMOO_SEED = seed
np.random.seed(100)
RandGenerator.seed(42)
rng = RandGenerator()
print(rng. integers(1, 10))
print('np rand=', np.random.randint(10))
print(rng.normal(0, 1))
And if you prefer to numpy style api, you can instance the local rand generator when it is imported.
I think this also suffers from correlated PRNGs as you're global seed is fixed meaning that every PRNG you spawn will start at the same point in the sequence. You would need to use the spawn of the Bit_Generator to ensure randomness which necessitates having some global PRNG.
With multi-threading we will definately need to work out a spawn strategy. I honestly have no idea what would happen with this PRs implementation currently I assume all the processes would copy the global PRNG and use correlated sequences but threads might be okay? I am not sure if there is any multi-processing state management currently as this PR is as close as reasonable to just using the np global state without doing so.
Hello @CompRhys , for multi-thread local random generator in pymoo, I didn't spend enough time on looking deep into it this week. And here is the document of multi threading numpy random generator : https://numpy.org/doc/stable/reference/random/multithreading.html, hope it helps. Happy hacking :)
@blankjul Any thoughts on this PR as stands? The only tests that currently fail are the hash assert tests that check there haven't been any changes in output. Due to the randomness of the GA and the changes to how the state is handled these hashes will need to be updated before merging. I will update those tests if you're happy with the rest of the PR.
Thanks for putting so much effort into this! And sorry for the delay but I was out of the office during the summer.
In the first release of pymoo, I implemented my own pseudo-random generator in fact and also had the option to change them (e.g. use the one from numpy instead). After another issue a few months ago, I realized that using globally the numpy random method is not the best design, however, at this point used throughout the whole framework.
I had a look at the comments through this PR, and agree if we reengineer this then we might want to think about multi-threading as well. In my opinion, the options are:
- Global Instance (like in this PR currently). I don't think this is thread-safe.
- Singleton Pattern (very similar but more flexible e.g. return a PRG for different thread ids)
- Binding a PRG to an algorithm (one downside is then it needs to be passed to all operators and other methods). But also multiple algorithms can run with different PRGs.
- Creating a
ContextorEnvironmentobject to generalize run-specific variables (I have seen this in other frameworks but not sure if this is overkill here)
What is your opinion on these solutions?
So my belief is that the way this is implemented here doesn't introduce any new issues that aren't present in the current numpy global state. A strict singleton would be better but I am not sure that there is a natural/clean way to define singletons in python? The only problem with this global is you can't use from x import y pattern as then the rng is recreated.
I also think binding the PRG to the algorithm would work but you would need to ensure that users didn't decide to just seed everything with 0 which could be fairly common.
I am not really sure of what the Context implementation might look like and so can't really say anthing.
- Global Instance (like in this PR currently). I don't think this is thread-safe.
- Singleton Pattern (very similar but more flexible e.g. return a PRG for different thread ids)
- Binding a PRG to an algorithm (one downside is then it needs to be passed to all operators and other methods). But also multiple algorithms can run with different PRGs.
- Creating a
ContextorEnvironmentobject to generalize run-specific variables (I have seen this in other frameworks but not sure if this is overkill here)
Hello @blankjul , What's the difference between 2nd solution and 4th solution? In 4th solution, did you mean that each algorithm owned one Context instance, and others object access random generator via the Context? and the singleton solution maintained only one Context globally?
@CompRhys I will try this weekend to work on your PR to add a Singleton interface. Should generally not be that difficult.
Instead of just getting a PRG the context could store more variables, e.g. Machine Accuracy, eps, ... I am currently not sure if this makes sense to bind or not. Basically, this could also be a global Config.
@CompRhys Can you give me permission to push to your PR? (I think it is not letting me do that because the fork persists in your repo)
If that does not work please let me know and I will create a separate pull request.
My solution would be to create an object
import numpy as np
class RandomState(object):
_instance = None
def __new__(cls, seed=None, new=False):
if cls._instance is None:
cls._instance = np.random.RandomState(seed)
return cls._instance
which can then be used throughout the framework by, e.g.
from pymoo.random import RandomState
RandomState().choice([a, b])
Hello @blankjul, please consider thread-safe singleton:
class Singleton(object):
_lock = threading.Lock()
_instance = None
@classmethod
def instance(cls):
if not cls._instance:
with cls._lock:
if not cls._instance:
cls._instance = cls()
return cls._instance
It's set to allow maintainer edits and so I'm not sure why it doesn't work. I am not sure it makes sense to give you access to my fork so probably better to fork off this and make a new PR. I quickly did a find replace with Jack's singleton pattern but it errors taking in an argument so perhaps also needs a __new__.
I think there would need to be further logic to ensure that the threading lock and the thread singletons were still deterministically set according to the seed.