deno_std icon indicating copy to clipboard operation
deno_std copied to clipboard

feat(random/unstable): basic randomization functions

Open lionel-rowe opened this issue 1 year ago • 9 comments

Closes https://github.com/denoland/std/issues/4848

Currently includes the following:

  • @std/random/between: randomBetween function
  • @std/random/integer-between: randomIntegerBetween function
  • @std/random/sample: enhanced version of (to-be-deprecated?) @std/collections/sample's sample function, with new weights and random options
  • @std/random/seeded-random: SeededRandom class implementing the PGC32 algorithm
  • @std/random/shuffle: shuffle function implementing Fisher–Yates shuffle

lionel-rowe avatar Aug 03 '24 15:08 lionel-rowe

Codecov Report

Attention: Patch coverage is 98.24561% with 3 lines in your changes missing coverage. Please review.

Project coverage is 96.24%. Comparing base (7c0e917) to head (6bb99e4). Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
random/sample.ts 91.66% 3 Missing :warning:
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #5626    +/-   ##
========================================
  Coverage   96.24%   96.24%            
========================================
  Files         485      491     +6     
  Lines       39269    39440   +171     
  Branches     5787     5811    +24     
========================================
+ Hits        37793    37958   +165     
- Misses       1432     1438     +6     
  Partials       44       44            

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 03 '24 15:08 codecov[bot]

  • @std/random/pick: pick and pickWeighted functions

Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:

const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });

Is there a particular reason why this algorithm is added?

I think this would be a cool addition to @std/collections.

timreichen avatar Aug 03 '24 18:08 timreichen

@timreichen

Is there a particular reason why this algorithm is added?

Only that it was the algorithm suggested in the linked issue, it has a period high enough for any sensible use case, and it seems to give a good distribution of results. One other plus is that arbitrarily chosen positive integer seeds still seem to give good entropy, even if they're apparently "low entropy" numbers like 1 or 12 (i.e. they don't need to be primes or anything fancy). [Edit: Seeds that are adjacent to each other do seem to give very similar first-few results though.] I don't have the requisite expertise to verify its randomness properties beyond "seems to give good results" though.

Edit 2: Upon looking into it further I think something like PCG64 (numpy's default) would be a better choice. Edit 3: PCG32, as the "recommended for most users" version of PCG.

lionel-rowe avatar Aug 04 '24 02:08 lionel-rowe

  • @std/random/pick: pick and pickWeighted functions

Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:

const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });

I quite like the idea of making it an option. It does require "unzipping" the values from their weights, but it can still be pretty versatile in terms of usage:

const weighted = new Map([["a", 5], ["b", 3], ["c", 2]]);
const result = sample([...weighted.keys()], { weights: [...weighted.values()] });

I'll keep the collections stuff within the scope of this PR for now as some of it relies on the random stuff (e.g. I'll use SeededPrng for tests), but happy to split it out before merging if that's preferable.

lionel-rowe avatar Aug 04 '24 05:08 lionel-rowe

@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

kt3k avatar Aug 07 '24 09:08 kt3k

@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?

@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?

timreichen avatar Aug 07 '24 10:08 timreichen

@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?

I think we can deprecate collections/sample when we stabilize random/sample.

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?

Or how about the name like SeededRandom? There is a similarly named npm package https://www.npmjs.com/package/seedrandom

kt3k avatar Aug 07 '24 11:08 kt3k

How best to represent SeededRandom's seed and state values is giving me a bit of a headache.

Type Pros Cons
BigInt - A single scalar value can hold a 64-bit uint
- Visually compact
- Can also hold negative values and values bigger than uint64, and validity can only be checked at runtime
- JSON.stringify throws
BigUint64Array - Probably the most semantically "correct" - The fact that the seed represents a 64-bit uint is an implementation detail that's not relevant for typical consumers of the library
- JSON.stringify throws
- Subject to endianness issues e.g. if created directly from underlying binary data. That seems like a deal-breaker to me.
Uint8Array - Still semantically OK - bytes are bytes are bytes
- Easy to extract uint64 values without worrying about endiannness using DataView#getBigUint64
- Length can only be checked at runtime
- JSON.stringify gives an object with numeric keys but no length property, i.e. it can't be ergonomically converted back to a Uint8Array
- The bytes really represent 64-bit uints, not 8-bit uints (but again, that's an implementation detail)
Hex string - Ergonomic to write
- Visually compact
- Strings give very weak compile-time type checking, and TS's template literal types can't represent more than a couple of hex digits before running out of space
Standard array of Bytes (TS union of number literals 0..255) - Can be easily converted to and from Uint8Arrays
- Hard-coded values can be fully type-checked at compile time! You can check both length and the values of elements, e.g. allowing [1, 1, 1, 1, 1, 1, 1, 1] but disallowing [1, 1, 1] or [1, 1, 1, 1, 1, 1, 1, 256].
- JSON.stringify is lossless except for type info
- Not very "webby"
- Sacrifices runtime semanticity for compile-time semanticity
- Requires as type assertions for values that aren't hard-coded or directly emitted from the library

One possible idea: Accept both Uint8Array and standard arrays of Bytes as inputs (maximum ergonomics for both hard-coded and non-hard-coded use), while emitting arrays of Bytes in the output (maximum portability — easy JSON serialization etc). Proof of concept: https://github.com/lionel-rowe/deno_std/commit/11961938e305f2b32f6a22562bf9510ea94aacfa

lionel-rowe avatar Aug 08 '24 12:08 lionel-rowe

One possible idea: accept both Uint8Array and standard arrays of Bytes as inputs (maximum ergonomics for both hard-coded and non-hard-coded use), while emitting arrays of Bytes in the output (maximum portability — easy JSON serialization etc)

This sounds good to me. Also this is going to be unstable package for a while. We can iterate later.

kt3k avatar Aug 08 '24 14:08 kt3k

One last thing. Can you please go through all exceptions and ensure they following the contributing guidelines? See https://github.com/denoland/std/blob/main/archive/tar_stream.ts for an example.

@iuioiua I think they already do, with the exception that error messages beginning with a variable name don't capitalize the variable name:

throw new RangeError("max must be greater than or equal to min");

How should those be handled? `max`/`min` with backticks would be another option.

lionel-rowe avatar Sep 04 '24 05:09 lionel-rowe