deno_std
deno_std copied to clipboard
feat(random/unstable): basic randomization functions
Closes https://github.com/denoland/std/issues/4848
Currently includes the following:
@std/random/between:randomBetweenfunction@std/random/integer-between:randomIntegerBetweenfunction@std/random/sample: enhanced version of (to-be-deprecated?)@std/collections/sample'ssamplefunction, with newweightsandrandomoptions@std/random/seeded-random:SeededRandomclass implementing the PGC32 algorithm@std/random/shuffle:shufflefunction implementing Fisher–Yates shuffle
Codecov Report
Attention: Patch coverage is 98.24561% with 3 lines in your changes missing coverage. Please review.
Project coverage is 96.24%. Comparing base (
7c0e917) to head (6bb99e4). Report is 2 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| random/sample.ts | 91.66% | 3 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #5626 +/- ##
========================================
Coverage 96.24% 96.24%
========================================
Files 485 491 +6
Lines 39269 39440 +171
Branches 5787 5811 +24
========================================
+ Hits 37793 37958 +165
- Misses 1432 1438 +6
Partials 44 44
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@std/random/pick:pickandpickWeightedfunctions
Note that we already have sample() in @std/collections. So if we introduced a weighted sample() function, it should be added to @std/collections or even as an option to the current sample() function:
const numbers = [1, 2, 3, 4];
const random = sample(numbers, { weights: [1, 9999, 2, 8888] });
@std/random/seeded:SeededPrngclass implementing the Wichman-Hill algorithm
Is there a particular reason why this algorithm is added?
@std/random/shuffle:shufflefunction implementing Fisher–Yates shuffle
I think this would be a cool addition to @std/collections.
@timreichen
@std/random/seeded:SeededPrngclass implementing the Wichman-Hill algorithmIs there a particular reason why this algorithm is added?
Only that it was the algorithm suggested in the linked issue, it has a period high enough for any sensible use case, and it seems to give a good distribution of results. One other plus is that arbitrarily chosen positive integer seeds still seem to give good entropy, even if they're apparently "low entropy" numbers like 1 or 12 (i.e. they don't need to be primes or anything fancy). [Edit: Seeds that are adjacent to each other do seem to give very similar first-few results though.] I don't have the requisite expertise to verify its randomness properties beyond "seems to give good results" though.
Edit 2: Upon looking into it further I think something like PCG64 (numpy's default) would be a better choice. Edit 3: PCG32, as the "recommended for most users" version of PCG.
@std/random/pick:pickandpickWeightedfunctionsNote that we already have
sample()in@std/collections. So if we introduced a weightedsample()function, it should be added to@std/collectionsor even as an option to the currentsample()function:const numbers = [1, 2, 3, 4]; const random = sample(numbers, { weights: [1, 9999, 2, 8888] });
I quite like the idea of making it an option. It does require "unzipping" the values from their weights, but it can still be pretty versatile in terms of usage:
const weighted = new Map([["a", 5], ["b", 3], ["c", 2]]);
const result = sample([...weighted.keys()], { weights: [...weighted.values()] });
I'll keep the collections stuff within the scope of this PR for now as some of it relies on the random stuff (e.g. I'll use SeededPrng for tests), but happy to split it out before merging if that's preferable.
@lionel-rowe Can you move shuffle and this version of sample to std/random (leaving the existing collections/sample untouched)?
Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.
@lionel-rowe Can you move
shuffleand this version ofsampletostd/random(leaving the existingcollections/sampleuntouched)?
@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?
Also I'd prefer to see
SeededPrngexported from@std/random/seeded-prng,randomBetweenfrom@std/random/between,randomIntegerBetweenfrom@std/random/integer-betweenfor consistency of the rest of std.
Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?
@kt3k Wouldn't that be confusing to have two sample() functions? Or do you mean to deprecate collections/sample in another PR?
I think we can deprecate collections/sample when we stabilize random/sample.
Also I'd prefer to see SeededPrng exported from @std/random/seeded-prng, randomBetween from @std/random/between, randomIntegerBetween from @std/random/integer-between for consistency of the rest of std.
Prng looks weird to me. Maybe it should it be written out as SeededPseudoRandomNumberGenerator?
Or how about the name like SeededRandom? There is a similarly named npm package https://www.npmjs.com/package/seedrandom
How best to represent SeededRandom's seed and state values is giving me a bit of a headache.
| Type | Pros | Cons |
|---|---|---|
BigInt |
- A single scalar value can hold a 64-bit uint - Visually compact |
- Can also hold negative values and values bigger than uint64, and validity can only be checked at runtime - JSON.stringify throws |
BigUint64Array |
- Probably the most semantically "correct" | - The fact that the seed represents a 64-bit uint is an implementation detail that's not relevant for typical consumers of the library - JSON.stringify throws- Subject to endianness issues e.g. if created directly from underlying binary data. That seems like a deal-breaker to me. |
Uint8Array |
- Still semantically OK - bytes are bytes are bytes - Easy to extract uint64 values without worrying about endiannness using DataView#getBigUint64 |
- Length can only be checked at runtime - JSON.stringify gives an object with numeric keys but no length property, i.e. it can't be ergonomically converted back to a Uint8Array- The bytes really represent 64-bit uints, not 8-bit uints (but again, that's an implementation detail) |
| Hex string | - Ergonomic to write - Visually compact |
- Strings give very weak compile-time type checking, and TS's template literal types can't represent more than a couple of hex digits before running out of space |
Standard array of Bytes (TS union of number literals 0..255) |
- Can be easily converted to and from Uint8Arrays- Hard-coded values can be fully type-checked at compile time! You can check both length and the values of elements, e.g. allowing [1, 1, 1, 1, 1, 1, 1, 1] but disallowing [1, 1, 1] or [1, 1, 1, 1, 1, 1, 1, 256].- JSON.stringify is lossless except for type info |
- Not very "webby" - Sacrifices runtime semanticity for compile-time semanticity - Requires as type assertions for values that aren't hard-coded or directly emitted from the library |
One possible idea: Accept both Uint8Array and standard arrays of Bytes as inputs (maximum ergonomics for both hard-coded and non-hard-coded use), while emitting arrays of Bytes in the output (maximum portability — easy JSON serialization etc). Proof of concept: https://github.com/lionel-rowe/deno_std/commit/11961938e305f2b32f6a22562bf9510ea94aacfa
One possible idea: accept both Uint8Array and standard arrays of Bytes as inputs (maximum ergonomics for both hard-coded and non-hard-coded use), while emitting arrays of Bytes in the output (maximum portability — easy JSON serialization etc)
This sounds good to me. Also this is going to be unstable package for a while. We can iterate later.
One last thing. Can you please go through all exceptions and ensure they following the contributing guidelines? See https://github.com/denoland/std/blob/main/archive/tar_stream.ts for an example.
@iuioiua I think they already do, with the exception that error messages beginning with a variable name don't capitalize the variable name:
throw new RangeError("max must be greater than or equal to min");
How should those be handled? `max`/`min` with backticks would be another option.