bounded-rands icon indicating copy to clipboard operation
bounded-rands copied to clipboard

A couple of algorithms for your consideration

Open sh1boot opened this issue 2 years ago • 1 comments

First, what I've been calling the Dr Jacques method:

static int m = 1, r = 0;
int bounded_rand(int n) {
    const int N = INT_MAX >> 1;
    int q, d;
    for (;;) {
        while (m < N) {
            // needs refactoring to consume 32-bit chunks
            r = (r << 1) | random_bit();
            m <<= 1;
        }
        q = m / n;
        if (r < q * n)
            break;
        r -= n * q;
        m -= n * q;
    }
    d = r % n;
    r /= q;
    m = q;
    return d;
}

This is an algorithm which doesn't do much to avoid division, but it is frugal with its use of the RNG. If you include a CSRNG in the pool of generators to test against, then this may win.

The other is my own:

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    uint64_t m = uint64_t(range) << 31;   // zero is fine too.
    for (int i = 0; i < 3; ++i) {
        uint32_t x = rng();
        m = uint64_t(x) * uint64_t(range) + (m >> 32);
    }
    return m >> 32;
}

This is biased, but the bias is extremely small. Change 3 to whatever you need. And it's branch-free on top of being division-free, provided the loop is unrolled.

If you analyse it you'll see it's simply a long-multiplication version of the standard multiply-and-shift bounded_rand(). Since range is only a 32-bit value there's only one carry path to follow and the rest of the result is discarded on the fly.

Fun fact: it's also the same construction as Marsaglia's multiply-with-carry generator. I believe there's a lag-r MWC construction analogue which should use fewer calls to rng() if anybody cared about that.

sh1boot avatar Dec 13 '23 22:12 sh1boot

Actually, I think lag-2 is just this?

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    static uint32_t saved = 0x80000000u;
    uint64_t m = uint64_t(saved) * uint64_t(range);
    uint32_t x = rng();
    m = uint64_t(x) * uint64_t(range) + (m >> 32);
    saved = uint32_t(m);
    return m >> 32;
}

The bits in saved are shifted down 32 bits every call, which should wash out the effect of multiplies pushing bias up from the bottom, right?

sh1boot avatar Dec 13 '23 23:12 sh1boot