First 8 digits of result of `ssn()` in 'nl_NL' provider are (needlessly) unique
First 8 digits of result of ssn() in 'nl_NL' provider are (needlessly) unique
Issue
The ssn() code generates the first 8 digits by calling random.sample(range(10), k=8), so only the 9th digit can duplicate one of the first 8 digits.
The first 8 digits are all different.
>>> assert all(len(set(fake.ssn()[:-1])) == 8 for _ in range(100_000))
Fix
The code should call random.choices(range(10), k=8) instead.
Requirement
Uniqueness of the first 8 digits is not a requirement for Dutch BSN's (see example at Wikipedia). It reduces the range from BSN's >80 million to only 1.6 million, which is less than the population of the Netherlands. That's how I found the issue, trying to generate a test file of 2 million unique BSN's.
Secondary issue
Although incorrect, the current version avoids ssn() results with 2 (or more) leading zeroes.
That's a happy accident as BSN's must be 8 (+ leading zero) or 9 digits long.
Therefore, when fixing ssn(), the code should also filter results with leading '00'. That can be accomplished
- either by brute force (i.e. filtering out generated
[0, 0, ...]digitslists) - or by choosing digit 2 from
range(1 if digit1 == 0 else 0, 10)(instead ofrange(10)).
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.