deno_std feat(random/unstable): basic randomization functions

Closes https://github.com/denoland/std/issues/4848

Currently includes the following:

@std/random/between: randomBetween function
@std/random/integer-between: randomIntegerBetween function
@std/random/sample: enhanced version of (to-be-deprecated?) @std/collections/sample's sample function, with new weights and random options
@std/random/seeded-random: SeededRandom class implementing the PGC32 algorithm
@std/random/shuffle: shuffle function implementing Fisher–Yates shuffle

Aug 03 '24 15:08 lionel-rowe

Codecov Report

Attention: Patch coverage is 98.24561% with 3 lines in your changes missing coverage. Please review.

Project coverage is 96.24%. Comparing base (7c0e917) to head (6bb99e4). Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
random/sample.ts	91.66%	3 Missing :warning:

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #5626    +/-   ##
========================================
  Coverage   96.24%   96.24%            
========================================
  Files         485      491     +6     
  Lines       39269    39440   +171     
  Branches     5787     5811    +24     
========================================
+ Hits        37793    37958   +165     
- Misses       1432     1438     +6     
  Partials       44       44

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Aug 03 '24 15:08 codecov[bot]

@std/random/pick: pick and pickWeighted functions

Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:

const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });

@std/random/seeded: SeededPrng class implementing the Wichman-Hill algorithm

Is there a particular reason why this algorithm is added?

@std/random/shuffle: shuffle function implementing Fisher–Yates shuffle

I think this would be a cool addition to @std/collections.

Aug 03 '24 18:08 timreichen

@timreichen

@std/random/seeded: SeededPrng class implementing the Wichman-Hill algorithm

Is there a particular reason why this algorithm is added?

Only that it was the algorithm suggested in the linked issue, it has a period high enough for any sensible use case, and it seems to give a good distribution of results. One other plus is that arbitrarily chosen positive integer seeds still seem to give good entropy, even if they're apparently "low entropy" numbers like 1 or 12 (i.e. they don't need to be primes or anything fancy). [Edit: Seeds that are adjacent to each other do seem to give very similar first-few results though.] I don't have the requisite expertise to verify its randomness properties beyond "seems to give good results" though.

Edit 2: Upon looking into it further I think something like PCG64 (numpy's default) would be a better choice. Edit 3: PCG32, as the "recommended for most users" version of PCG.

Aug 04 '24 02:08 lionel-rowe

@std/random/pick: pick and pickWeighted functions

Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:
const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });

I quite like the idea of making it an option. It does require "unzipping" the values from their weights, but it can still be pretty versatile in terms of usage:

const weighted = new Map([["a", 5], ["b", 3], ["c", 2]]);
const result = sample([...weighted.keys()], { weights: [...weighted.values()] });

I'll keep the collections stuff within the scope of this PR for now as some of it relies on the random stuff (e.g. I'll use SeededPrng for tests), but happy to split it out before merging if that's preferable.

Aug 04 '24 05:08 lionel-rowe

@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Aug 07 '24 09:08 kt3k

@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?

@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?

Aug 07 '24 10:08 timreichen

@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?

I think we can deprecate collections/sample when we stabilize random/sample.

Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.

Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?

Or how about the name like SeededRandom? There is a similarly named npm package https://www.npmjs.com/package/seedrandom

Aug 07 '24 11:08 kt3k

How best to represent SeededRandom's seed and state values is giving me a bit of a headache.

Type	Pros	Cons
`BigInt`	- A single scalar value can hold a 64-bit uint - Visually compact	- Can also hold negative values and values bigger than uint64, and validity can only be checked at runtime - `JSON.stringify` throws
`BigUint64Array`	- Probably the most semantically "correct"	- The fact that the seed represents a 64-bit uint is an implementation detail that's not relevant for typical consumers of the library - `JSON.stringify` throws - Subject to endianness issues e.g. if created directly from underlying binary data. That seems like a deal-breaker to me.
`Uint8Array`	- Still semantically OK - bytes are bytes are bytes - Easy to extract uint64 values without worrying about endiannness using `DataView#getBigUint64`	- Length can only be checked at runtime - `JSON.stringify` gives an object with numeric keys but no `length` property, i.e. it can't be ergonomically converted back to a `Uint8Array` - The bytes really represent 64-bit uints, not 8-bit uints (but again, that's an implementation detail)
Hex string	- Ergonomic to write - Visually compact	- Strings give very weak compile-time type checking, and TS's template literal types can't represent more than a couple of hex digits before running out of space
Standard array of `Byte`s (TS union of number literals 0..255)	- Can be easily converted to and from `Uint8Array`s - Hard-coded values can be fully type-checked at compile time! You can check both length and the values of elements, e.g. allowing `[1, 1, 1, 1, 1, 1, 1, 1]` but disallowing `[1, 1, 1]` or `[1, 1, 1, 1, 1, 1, 1, 256]`. - `JSON.stringify` is lossless except for type info	- Not very "webby" - Sacrifices runtime semanticity for compile-time semanticity - Requires `as` type assertions for values that aren't hard-coded or directly emitted from the library

One possible idea: Accept both Uint8Array and standard arrays of Bytes as inputs (maximum ergonomics for both hard-coded and non-hard-coded use), while emitting arrays of Bytes in the output (maximum portability — easy JSON serialization etc). Proof of concept: https://github.com/lionel-rowe/deno_std/commit/11961938e305f2b32f6a22562bf9510ea94aacfa

Aug 08 '24 12:08 lionel-rowe

One possible idea: accept both Uint8Array and standard arrays of Bytes as inputs (maximum ergonomics for both hard-coded and non-hard-coded use), while emitting arrays of Bytes in the output (maximum portability — easy JSON serialization etc)

This sounds good to me. Also this is going to be unstable package for a while. We can iterate later.

Aug 08 '24 14:08 kt3k

One last thing. Can you please go through all exceptions and ensure they following the contributing guidelines? See https://github.com/denoland/std/blob/main/archive/tar_stream.ts for an example.

@iuioiua I think they already do, with the exception that error messages beginning with a variable name don't capitalize the variable name:

throw new RangeError("max must be greater than or equal to min");

How should those be handled? `max`/`min` with backticks would be another option.

Sep 04 '24 05:09 lionel-rowe

deno_std deno_std copied to clipboard

feat(random/unstable): basic randomization functions

Codecov Report

deno_std
deno_std copied to clipboard