deephaven-core icon indicating copy to clipboard operation
deephaven-core copied to clipboard

Seed random number generation for query language rng functions

Open alexpeters1208 opened this issue 1 year ago • 2 comments

At the moment, there is not a good way to seed query language functions like randomGaussian() or randomInt() in a way that's thread-safe and efficient. This may take some real investigation.

alexpeters1208 avatar Jul 09 '24 17:07 alexpeters1208

A seeded PRNG is by its nature stateful; thus, there is no way to have a static function do what you want w/ a seeded PRNG (with the intention being some sort of reproducible PRNG associated with that specific query). If you need a seeded PRNG, the user will need some way to create that state and then reference it from the query.

An approach that might satisfy your needs, but isn't a PRNG, is some sort of mixing or hashing function based on a state (which we can assume to be the internals of the PRNG):

table.view(["X = mix_or_hash(my_seed ^ ii)"])

Depending on the quality of the hash, this may be a reasonable proxy for a PRNG. Utilities based on this hash could be built out:

table.view(["X = stateRandomInt(my_seed ^ ii, 0, 5)", "Y = stateRandomGuassian(my_seed ^ ii)"])

And https://en.wikipedia.org/wiki/Linear_congruential_generator might be fast and good enough for this use case. Or https://en.wikipedia.org/wiki/Permuted_congruential_generator.

devinrsmith avatar Jul 11 '24 15:07 devinrsmith

After looking at this and thinking about it for a while, I'm going to delay doing anything.

  1. The current random number generation uses ThreadLocalRandom to get high efficiency random number generation on multiple threads. Moving away from ThreadLocalRandom as the default seems like a bad choice.
  2. Pseudo-random number generators start with a seed and then create a sequence of numbers that obey certain statistical properties. Generating a sequence of random number generators with a sequence of seeds and then taking one number from each generator does not produce a sequence of random numbers with the same properties. It is possible that the numbers superficially look the same, but there is not a guarantee that numbers created this way will have the correct properties.

As a result, I'm going to delay action until we get more specific user feedback on what is desirable.

  1. We could let users create their own RNG for special cases, such as seeding.
  2. We could create a single threaded RNG that is less performant but supports a seed.

Either way, it is prudent to wait for more user specifications before moving forward.

chipkent avatar Sep 23 '24 19:09 chipkent