CipesAtHome icon indicating copy to clipboard operation
CipesAtHome copied to clipboard

RNG concerns

Open notwa opened this issue 3 years ago • 2 comments

RecipesAtHome appears to be using C's standard library for random number generation, which is sketchy for a couple of reasons:

  • the underlying implementation could be arbitrarily bad. it's almost certainly an LCG, which isn't inherently bad, but some internal constants work better than others. there's quite a few of them out there. Microsoft (msvcrt.dll) still seems to use (x=x*214013+2531011)>>16 in particular.
  • it probably uses a relatively small amount of internal state. the period of the lower bits of an LCG can be improved by mixing an internal, larger state into a smaller output integer type.
  • it's not thread-safe, unless OMP is doing something fancy here.

additionally, there's a bit of bias in the way the RNG is being used. using a non-power-of-two modulus with a power-of-two-based RNG creates bias, unless the result is suitably re-rolled. there's also an instance of rand() % 100 < 50 where a simple rand() & 1 would suffice. (I suppose you were playing with the ratio at some point)

~~I think what this means is…~~ edit: I'm looking at the code again with a clearer mind, and the impact should be minimal since handleSelectAndRandom typically falls through which does not invoke the RNG, assuming default settings.

anyway, I've thrown together an ad-hoc patch utilizing the PCG C library (on github) (Apache 2.0/MIT dual-licensed) that should resolve these issues. this library is still LCG-based, but: the constants should be sound, it implements output mixing, it supports up to 128-bits of internal state (what I've chosen), and it performs re-rolls for bounded queries. there are plenty of other libraries out there; this was simply the first to catch my attention. I didn't make this a pull request since my patch is kind of a hack.

notwa avatar Sep 30 '20 07:09 notwa

As far as I'm concerned, randomise is a legacy setting that we should eventually stop supporting. The built-in randomization has a separate state for each thread, but I'm curious if you think the period of the lowest bit of the output is likely to ever coincide exactly with starting a new branch. That is the only time that I think periodicity will matter, but it will be catastrophic if it occurs (effectively wasting a thread until the program is restarted).

With that said, if there is no locking or other performance difference for a pull request with a different PRNG, I would see little problem with merging it in if it avoids that issue.

Amphitryon0 avatar Sep 30 '20 22:09 Amphitryon0

In my current dev branch, I successfully got the C++ random number generators working (even though this is a C program, I have a very lightweight wrapper for it).

I tried quite a multitude of RNG implementations, both library standards and third party, and std::linear_congruential_engine seemed to be one of the very few that was faster then rand() for RecipesAtHome (and it can be made thread safe pretty trivially)

If it doesn't bloat the final binary by much, I'll try getting that in a pull requestable state.

techsy730 avatar Nov 08 '21 04:11 techsy730