A couple of algorithms for your consideration

Open sh1boot opened this issue 2 years ago • 1 comments

First, what I've been calling the Dr Jacques method:

static int m = 1, r = 0;
int bounded_rand(int n) {
    const int N = INT_MAX >> 1;
    int q, d;
    for (;;) {
        while (m < N) {
            // needs refactoring to consume 32-bit chunks
            r = (r << 1) | random_bit();
            m <<= 1;
        }
        q = m / n;
        if (r < q * n)
            break;
        r -= n * q;
        m -= n * q;
    }
    d = r % n;
    r /= q;
    m = q;
    return d;
}

This is an algorithm which doesn't do much to avoid division, but it is frugal with its use of the RNG. If you include a CSRNG in the pool of generators to test against, then this may win.

The other is my own:

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    uint64_t m = uint64_t(range) << 31;   // zero is fine too.
    for (int i = 0; i < 3; ++i) {
        uint32_t x = rng();
        m = uint64_t(x) * uint64_t(range) + (m >> 32);
    }
    return m >> 32;
}

This is biased, but the bias is extremely small. Change 3 to whatever you need. And it's branch-free on top of being division-free, provided the loop is unrolled.

If you analyse it you'll see it's simply a long-multiplication version of the standard multiply-and-shift bounded_rand(). Since range is only a 32-bit value there's only one carry path to follow and the rest of the result is discarded on the fly.

Fun fact: it's also the same construction as Marsaglia's multiply-with-carry generator. I believe there's a lag-r MWC construction analogue which should use fewer calls to rng() if anybody cared about that.

Dec 13 '23 22:12 sh1boot

Actually, I think lag-2 is just this?

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    static uint32_t saved = 0x80000000u;
    uint64_t m = uint64_t(saved) * uint64_t(range);
    uint32_t x = rng();
    m = uint64_t(x) * uint64_t(range) + (m >> 32);
    saved = uint32_t(m);
    return m >> 32;
}

The bits in saved are shifted down 32 bits every call, which should wash out the effect of multiplies pushing bias up from the bottom, right?

Dec 13 '23 23:12 sh1boot