tpot2 icon indicating copy to clipboard operation
tpot2 copied to clipboard

Replace remaining calls to legacy `np.random.seed()`

Open chimaerase opened this issue 2 years ago • 1 comments

Thanks for all of your work on TPOT! I and my colleagues use it very often for a variety of synthetic biology research projects.

I'd like to ask that TPOT 2 accept / use only np.random.Generator objects as an alternative to int RNG seeds, and avoid using the global / legacy np.random number generator, even internally, if possible. A quick search through the TPOT2 code at the time of writing (11/17/23), shows this is largely already the case except for 3 remaining instances of the string "random.seed", which refer to the legacy np.random.seed().

Our code sometimes executes multiple TPOTRegressors in parallel, and TPOT 1's dependence on the global np.random generator has caused problems with repeatability. For example, if unpredictable OS-level thread scheduling changes the sequence of calls to the shared np.random.randint() or similar functions. There are workarounds, e.g. using subprocesses instead of threads, but IMO TPOT should be maximally flexible and ideally not require workarounds.

chimaerase avatar Nov 17 '23 18:11 chimaerase

the next version addresses this with PR #156. np.random.see has been removed and everything should only rely on np.random.Generator for now.

perib avatar Sep 30 '24 15:09 perib