random
random copied to clipboard
uniform-int (and likely other distributions) are not as uniform as possible
uniform-int currently functions by stretching the output range of the RNG into the desired range. However, this is not strictly correct as it depends on the output size of the RNG. For instance, with a small-scale example, if the RNG only outputs 16 bits and we're generating a number from 0 to 5:
>>> collections.Counter(i * 6 // 0xffff for i in range(0xffff))
Counter({0: 10923, 2: 10923, 4: 10923, 1: 10922, 3: 10922, 5: 10922})
We can run a simulation to see the effect of this as well:
const random = require("random")
const hist = new Map();
for (let i = 0; i < 20000000; i++) {
const n = random.int(0, 5);
hist.set(n, (hist.get(n) || 0) + 1);
}
and we get:
Map(6) {
4 => 3334526,
5 => 3333539,
3 => 3334684,
1 => 3331604,
0 => 3333306,
2 => 3332341
}
In order to get a truly uniform distribution, the generator needs to instead:
- operate by generating bits with a uniform distribution (it's likely that Math.random will at least be uniform over 64 bits, so it's possible to generate 64 bits at a time).
- generate enough bits to hold the max value
- if the generated number is > max value, discard and try again
https://github.com/python/cpython/blob/8a0d9a6bb77a72cd8b9ece01b7c1163fff28029a/Lib/random.py#L235-L243 has an example of how it works.
Thanks @bigfarts.
I think there are a few considerations here.
- I really like the simplicity of the design where all distributions are built up from a pluggable PRNG function — which provides for a lot of extensibility. You could, for instance, pass a custom PRNG function which is based on random bytes to get the effect you're looking for.
- I think using
Math.random()
as the default PRNG is "good enough". Even in the example you provide, I'm not sure this histogram is necessarily statistically significant? - Maybe we could provide a built-in PRNG that is built on top of random bytes to better satisfy the intended distributions?
- Maybe we could provide a note in the readme about the "randomness" of the default PRNG? I think a lot of folks will be fine with using
Math.random()
as the base, but it would definitely be good to call out so people won't be burnt by this as a subtle surprise.
Thoughts?
-
I think because of the way
uniform-int
is implemented, it's not possible to write a PRNG function to ensure it's not biased.Consider that if you wanted a number between [0,7): that's 7 possible numbers. In order to have a uniform output distribution, under the current implementation, the PRNG must uniformly distribute its output into a multiple of 7 buckets, otherwise via the pigeonhole principle you will have some value that's biased (if the PRNG outputs values in 8 buckets, then buckets 1-7 will be mapped into the range [0,7) and bucket 8 will be mapped into one of the values in the range that has already been mapped, biasing the output distribution).
However, I do think that pluggable PRNG functions are fine if the primitive is changed from generating a float of unknown entropy to uniformly distributed bytes, with
uniform-int
implementing rejection sampling: that way, any generator can be assured that a PRNG will always output 8 bits of entropy. -
This blog post talks a bit about the biasing effect -- while it uses modulo rather than division, the problem is similar, just where the bias ends up falling (i.e. within a bunch of numbers in the range rather than small numbers at the start).
I think it's worthwhile to make it such that if the name of the function is
uniform-int
it should truly be uniform. -
(see bullet 1)
-
I think
Math.random()
is fine (well, as fine as it gets in JavaScript) -- it's mostly an implementation detail ofuniform-int
that causes this.
If you'd like a draft PR for the suggested change in bullet 1, let me know!
For what it's worth, I put together an example of how various primitives can be stacked on top of each other to provide a truly uniform implementation of uniform-int
, with randRange being the function here (using Python's naming scheme): https://gist.github.com/bigfarts/fc71ec9d42ca20116ac0bbfbd721940d
This example also uses getRandomValues
as a primitive rather than Math.random
, so window.crypto
can be used as a drop-in RNG and you don't have to squeeze bytes into a float then squeeze them back out into bytes.