faker
faker copied to clipboard
Proposal: Use a single seed value per faker function invocation
Clear and concise description of the problem
Glossary:
- seed value: direct or indirect invocation of randomizer.next() or equivalents thereof
Currently when invoking a faker function it may consume an unknown number of seed values (0-n). While that isn't bad by itself, it has the side effect, that any change to the implementation affects the user by generating different subsequent values. This is especially relevant for functions that generate multiple values such as multiple and unique, each of the generated elements affect the following elements generated in them and all elements afterwards.
Suggested solution
Let each method use only a single seed value by deriving it, if it uses more than one. Each method would be responsible for itself to consume only a single seed value.
function multiple(fakerCore: Faker(Core), generator: (fakerCore, index) => T, options): T[] {
// consumes a single seed from the original
const derived = fakerCore.derive();
// consumes a seed value from the derived
const count = rangeToNumber(derived, options.count);
// each call on the generator consumes another seed from derived
// if the generator would need more than one single value it would derive by itself
// even if it doesn't the multiple function upholds its contract and still behaves better than a simple for loop
return Array.from({length: count}, (_, i) => generator(derived, i));
}
Important usage detail, the fakerCore
instance must be passed on and used by any nested code.
Deriving an instance does come at a performance cost, but we could make it cheap, if we keep that as a priority during the derive implementation and use standalone functions like teased in the code example.
E.g. by not re-initializing the twister from scratch, but only copy and transforming the state.
derive() {
let random = this.state.next();
const stateCopy = this.state.map((old) => old + random + aRandomStaticValue + 0 * (random = old));
return new Twister(stateCopy);
}
(We should measure though how much "re-initializing" vs "copy and transforming the state" actually has of a performance impact)
It also comes at a cost of additional code, we could hide that in our potential meta framework though.
This section is largely unrelated to this proposal and should just demonstrate, how the derive could be included in the potential meta framework.
function multiple(fakerCore: Faker(Core), generator: (fakerCore, index) => T, options): T[] {
const count = rangeToNumber(fakerCore, options.count);
return Array.from({length: count}, (_, i) => generator(fakerCore, i));
}
--- Autogenerated same file
declare function boundMultiple(generator: (fakerCore, index) => T, options): T[];
[...]
export const multiple = fakerize<typeof multiple, typeof boundMultiple>(multiple, {derive: true, isCallable: ...});
[...]
function fakerize<TRaw, TBound>(fn: TRaw, options): Fakerized<TRaw, TBound> {
if (options.derive) {
fn = (fakerCore, ...args) => fn(fakerCore.derive(), ...args);
}
fn.bindTo = (fakerCore) => (...args) => fn(fakerCore, ....args) as TBound;
fn.isCallable = ...;
return fn;
}
Usage
multiple(fakerEN, (fakerCore) => firstName(fakerCore), 5);
// or multiple(fakerEN, firstName, 5);
const multipleEN = multiple.bindTo(fakerEN);
multipleEN((fakerCore) => firstName(fakerCore), 5);
// or multipleEN(firstName, 5);
// Why not like this?
const firstNameEN = firstName.bindTo(fakerEN);
multipleEN(() => firstNameEN(), 5);
// or
multiple(fakerCore, () => firstNameEN(), 5);
// Because firstNameEN would consume the seeds directly from the bound fakerEN instance
// and thus bypass most benefits from `derive`.
Alternative
Don't change the current behavior.
Additional context
Relevant issues/PRs:
- #1499 (outdated demonstration of the feature)
- #1250 (required for performance reasons)
- #2667
Potentially impacted issues/PRs:
- #2661 (due to the one seed per invocation feature)
We already talked about this feature quite a bit, but I would like to have this issue, to bring everybody on the same level. And give everybody the chance to comment and react.
How would that work though with fake patterns since different fake patterns can have different numbers of placeholders?
Fake would assume that the pattern requires multiple seed values and always derive at the start of the fake method.
Team Decision
- We want to do this conceptually, but aren't sure about the exact implementation requirements.
Blocked by #2667
- #2667