Bogus icon indicating copy to clipboard operation
Bogus copied to clipboard

Use existing value to "seed" another.

Open MelGrubb opened this issue 2 years ago • 3 comments

Please describe why you are requesting a feature

Apologies if this exists already, but I'm looking for a way to use Bogus to anonymize data. I would like to anonymize production data for use in a QA environment, but it would be nice if the data came out the same way each time. In other words, I would like to use "Bob Smith" as input and have it come out as "Fred Jones" each time. This is only an example, I don't literally mean those specific names, but it would be helpful for QA if the anonymized data were stable so that when we refresh the data, the example person they were looking at last week still has the same name.

tl;dr - I would like a way to pass a "seed" value to individual rules to ensure that the same random value is generated each time, based on an input value so that, for example, using "Bob" as the seed value always results in "Fred" being generated.

Please provide a code example of what you are trying to achieve

Something like this:

var testUsers = new Faker<User>()
    .RuleFor(u => u.FirstName, (f, u) => f.Name.FirstName(u.Gender, seed = {Some string value}))

Ideally, "Some value" would be automatically derived from an input value based on the real-world data, such as the existing record's FirstName property.

Please answer any or all of the questions below

  • Is the feature something that currently cannot be done? Not that I have found in the examples, but I could be simply missing something.

  • What alternatives have you considered? Seeding the generator with the .UseSeed method on each loop through the anonymizer, based on hashing the record's Id. As pointed out in https://github.com/bchavez/Bogus/issues/104 though, any changes made to the structure such as the addition of new fields would throw everything after that off.

  • Is this feature request any issues or current problems?

  • Has the feature been requested in the past? Not that I could find in a cursory search of other requests. The closest I've found is https://github.com/bchavez/Bogus/issues/104

If the feature request is approved, would you be willing to submit a PR?

No I wish I had the time, but I don't. Maybe if I get to retire from the day job someday.

MelGrubb avatar Jan 12 '23 14:01 MelGrubb

Other use-cases could be:

  • Pick a random "StartDate" in the past, and an optional "EndDate" which must lie after the chosen StartDate
  • Pick a random "IsCancelled" value, and only populate the "CancellationReason" if the chosen 'IsCancelled' value was true

Generally speaking, how would we teach Bogus about (basic) dependencies between properties in our data models?

I could imagine a syntax like this:

.RuleFor(x => x.StartDate, f => f.Date.Past())
.RuleFor(x => x.EndDate, f => f.Date.Future(relativeToProperty: x => x.StartDate).OrNull(f))  // Suggested syntax, not working

or:

.RuleFor(x => x.IsCancelled, f => f.Random.Bool())
.RuleFor(x => x.CancellationReason, f => f.Random.Words().OrNullWhen(x => !x.IsCancelled, f)  // Suggested syntax, not working

Crossbow78 avatar Jan 13 '23 15:01 Crossbow78

I am using Bogus already to do something you are doing,

I added the following code and used the ID of the data from the database to fill the seed.

var faker = new Faker
{
    Random = new Randomizer(seed)
};

Pigna avatar Jun 20 '23 12:06 Pigna

That was addressed in my second bullet point. Seeding the randomizer based on the Id, or a hash of the Id works great until you add new properties to the object. To ensure that each output property is stable, you'd have to re-seed the randomizer for each and every individual field, which would be very cumbersome. I'm specifically looking for per-field seeding based on an input value so that the output is random, but stable for each input value.

MelGrubb avatar Jun 20 '23 12:06 MelGrubb