datafaker Feature idea: define probabilities for values in yml file

As an idea, what if we declare the desired distributions in percentages instead of duplicating the values?

Like, instead of duplicating "more desired" name patterns in yml:

- name:
  - "#{Name.last_name} #{suffix}"
  - "#{Name.last_name} #{suffix}"
  - "#{Name.man_last_name} a #{Name.man_last_name} #{suffix}"

we would write:

- name:
 - 60%: "#{Name.last_name} #{suffix}"
 - 40%: "#{Name.man_last_name} a #{Name.man_last_name} #{suffix}"

or even with lists:

- domain:
  - 90%: ".it"
  - 10%: [".ee", ".lv", ".lt", ".ca", ...]

Aug 20 '25 22:08 asolntsev

I like that idea.

We just need to ensure that there can be some rounding error or something in the math/tests because as we've seen in that other issue recently opened doing things in a weighted manner isn't always simple/exact.

Edit: Issue ref - https://github.com/datafaker-net/datafaker/issues/1623

Aug 21 '25 00:08 kingthorin

I do agree that the current approach isn't ideal. It does sound a bit of a big change for something which which maybe shouldn't be dictated by us. Or should it? I'm not sure. There's a few other areas which perhaps are more important, and might be slightly related, such as an age distribution. For example:

I think it's unreasonable to think that our DOB generation has the same chance of someone being 40 as someone being 100. Maybe that can be solved with the above solution, but maybe nextGaussian would be good enough?

Aug 24 '25 10:08 bodiam