faker icon indicating copy to clipboard operation
faker copied to clipboard

First 8 digits of result of `ssn()` in 'nl_NL' provider are (needlessly) unique

Open Dutcho opened this issue 4 months ago • 1 comments

First 8 digits of result of ssn() in 'nl_NL' provider are (needlessly) unique

Issue

The ssn() code generates the first 8 digits by calling random.sample(range(10), k=8), so only the 9th digit can duplicate one of the first 8 digits. The first 8 digits are all different.

>>> assert all(len(set(fake.ssn()[:-1])) == 8 for _ in range(100_000))

Fix

The code should call random.choices(range(10), k=8) instead.

Requirement

Uniqueness of the first 8 digits is not a requirement for Dutch BSN's (see example at Wikipedia). It reduces the range from BSN's >80 million to only 1.6 million, which is less than the population of the Netherlands. That's how I found the issue, trying to generate a test file of 2 million unique BSN's.

Secondary issue

Although incorrect, the current version avoids ssn() results with 2 (or more) leading zeroes. That's a happy accident as BSN's must be 8 (+ leading zero) or 9 digits long.

Therefore, when fixing ssn(), the code should also filter results with leading '00'. That can be accomplished

  • either by brute force (i.e. filtering out generated [0, 0, ...] digits lists)
  • or by choosing digit 2 from range(1 if digit1 == 0 else 0, 10) (instead of range(10)).

Dutcho avatar Aug 23 '25 22:08 Dutcho

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Nov 22 '25 02:11 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Dec 06 '25 02:12 github-actions[bot]