wren icon indicating copy to clipboard operation
wren copied to clipboard

`Random.new(seed)` where `seed` : `Num` relies on platform's cstdlib `srand()`/`rand()`

Open nitz opened this issue 3 years ago • 5 comments

Making use of passing a number to Random.new(seed) eventually calls randomSeed1(WrenVM* vm).

randomSeed1 uses that number cast as a uint32_t to call cstdlib's srand(), and then fills out the initial Well512 state with 16 subsequent calls to rand().

The result of this is that on a given platform, seeding random with a single number will produce an expected, repeatable output. Using the same seed on a different platform however, may not produce the same output as the original.

Example:

import "random" for Random
var letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
var rng = Random.new(1234)
System.print(rng.sample(letters, 10))

On my windows machine outputs: [P, Q, D, T, G, V, W, C, J, F] However, the try wren page prints: [G, O, H, E, U, A, W, J, M, Z]

Forcing Random.new() to seed the initial state with non-rand() values by passing a Sequence instead causes the behavior to be properly deterministic:

    import "random" for Random
    var letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    var rng = Random.new([1234]) // note [1234] rather than 1234
    System.print(rng.sample(letters, 10))

windows: [Q, B, S, T, U, C, I, F, W, P] try wren: [Q, B, S, T, U, C, I, F, W, P]

I would think that providing a seed of any sort should be deterministic even between platforms.

Cheers!

(Edit: fixed typo in link)

nitz avatar Jul 20 '22 14:07 nitz

Thanks for the PR! We can likely include this fix in 0.5.

ruby0x1 avatar Jul 21 '22 18:07 ruby0x1

Idea is correct, and should probably be extended to not rely on system randomness.

Implementation should be checked for improvement since algorithm is narrowed to uint15. And it could also be extended by using a config callback.

mhermier avatar Jul 21 '22 19:07 mhermier

The 15 bit return value is "by design", at least in Microsoft's cstdlib implementation (which I actually saw referenced in the same paper that the Well512 impl came from). It's certainly by no measure good, but was enough to at least get some distribution in the seed bits. That implementation also had the effect of bringing the same deterministic result to the Random.new(1234) in my webasm tests vs. the same code in an unpatched version built with msvc.

I'm the wrong person to ask if that's ideal, I just opted for the smallest amount of wake. It certainly could use a larger range, I just believe the MS impl specifically uses those upper 15 bits because the lower tends to be poor quality with the constants they chose. My rambly point being: the only real desire here with randomSeed1 is to stretch the user's 4 byte input to fill the full 512 bytes of state. As for how complicated that needs to be, 🤷‍♀️, but I don't think it needs to be on the level of a CSPRNG when all that's really happening is some basic whitening for the initial state.

I feel like giving a callback as part of either seeding or generating a small stream of PRNs that are only being generated to fill out the initial state of the intended PNRG feels like overkill. If the user wanted full control, they already have the ability to seed the instance by passing the entire 512 byte state via randomSeed16.

And I guess while I'm rambling...

Related, is that the no-seed provided (randomSeed0) method still depends on srand()/rand(). In particular, it seeds random passing time(NULL). That value (at least by typical C conventions) returns the number of seconds since epoch, which means it only changes every second.

If a user created several Random instances via Random.new(), expecting the system to seed each of them separately (as in, handing dice to players or the sort,) then the odds are very high that all of those instances would have the exact same seeds, and produce the exact same value sequences. I figure it'd be quite unlikely that's what the user would want.

Of course, actually resolving this can be a can of worms as it is, as I'm not sure of anything that's both C99 and x-plat. A way I've solved this before, at least on x86/x86_64 platforms, has been by using the RDTSC (read timestamp counter) instruction. (msvs supports a __rdtsc() intrinsic, which GCC has supported since 4.5, alternatively an inline assembly call works well too).

That leaves you out to dry on other architecture, or at least juggling to try and find a solution for each arch as you run into them. (PMCCNTR on arm, etc.)

Random is a wonderfully painful example of something that should be simple, but of course, somehow isn't. 😅

nitz avatar Jul 21 '22 20:07 nitz

Nothing is simple, everything is trades of in computing.

mhermier avatar Jul 21 '22 20:07 mhermier

Nothing is simple, everything is trades of in computing.

Which is exactly why they clarified the trade off being made. Thanks @nitz !

ruby0x1 avatar Jul 21 '22 20:07 ruby0x1