faker icon indicating copy to clipboard operation
faker copied to clipboard

Invalid zip/postal codes when using locale en_GB

Open mikeybinns opened this issue 2 years ago • 7 comments

Pre-Checks

Describe the bug

Very closely related to #1416 but for a different locale, so I though a new ticket would be appropriate.

Great Britain post codes also have some specific rules which means that some of the generated postcodes are invalid.

I have a REGEX string for validating if a GB post code is valid, so I'm sharing it here as hopefully it will help, even if only with automated testing. This regex is created based on this gov.uk document

/^([A-Z]{1,2}[0-9]{1,2}|[A-Z]{1,2}[0-9]{1}[A-Z]{1})([0-9]{1}[ABDEFGHJLNPQRSTUWXYZ]{2})$/gi,

I think the biggest tripping block for Faker here is that the final two letters cannot be C I K M O V.

For those who come across this issue, here's a workaround to get you going in modern TypeScript:

/**
 * Generate a valid UK postcode using faker.
 * 
 * Faker's implementation is too generic to work with strict 
 * post code validators, so we must check the values returned 
 * to see if they are valid
 * @see https://github.com/faker-js/faker/issues/2390
 * 
 * @returns A promise which resolves to a valid UK postcode
 */
async function generatePostCode() {
  return new Promise<string>((resolve) => {
    let postcode = "";
    while (
      postcode
        // Remove all characters except numbers and letters (including spaces as spaces are optional)
        .replaceAll(/[^A-Z0-9]/gi, "")
        // Check value against REGEX
        .match(
          /^([A-Z]{1,2}[0-9]{1,2}|[A-Z]{1,2}[0-9]{1}[A-Z]{1})([0-9]{1}[ABDEFGHJLNPQRSTUWXYZ]{2})$/gi,
        )
    ) {
      postcode = faker.location.zipCode();
    }
    resolve(postcode);
  });
}

Minimal reproduction code

import { fakerEN_GB as faker } from "@faker-js/faker"; console.log(faker.location.zipCode());

Additional Context

In case the link breaks, here is that gov.uk document I mentioned. Appendix_C_ILR_2017_to_2018_v1_Published_28April17.pdf

Environment Info

System:
  OS: macOS 13.5.1
  CPU: (8) arm64 Apple M1
  Memory: 480.53 MB / 16.00 GB
  Shell: 5.9 - /bin/zsh
Binaries:
  Node: 18.17.1 - ~/.nvm/versions/node/v18.17.1/bin/node
  npm: 9.6.7 - ~/.nvm/versions/node/v18.17.1/bin/npm
Browsers:
  Chrome: 116.0.5845.179
  Safari: 16.6
npmPackages:
  @faker-js/faker: ^8.0.1 => 8.0.2

Which module system do you use?

  • [ ] CJS
  • [X] ESM

Used Package Manager

npm

mikeybinns avatar Sep 11 '23 10:09 mikeybinns

I think the tricky thing is knowing when to stop!

Because in reality the first one-two chars can't be ANY A-Z, they can only be one of the https://en.wikipedia.org/wiki/List_of_postcode_areas_in_the_United_Kingdom valid postcode areas. So AB... and AL... are valid but not AA.. or AX...

So then its tempting to hardcode those options.

But then AL only goes up from 1...10, not 1...99. So maybe we should encode them, etc etc.

The only true way to validate if a postcode is valid is to compare it against the full https://en.wikipedia.org/wiki/Postcode_Address_File

In some cases i think its OK to return plausible but invalid data if the complexity costs of adding more validity checks are too high.

I don't really know on which side of this arbitrary line this suggestion falls though!

matthewmayer avatar Sep 11 '23 11:09 matthewmayer

I understand what you're saying, but I think my suggestion is only related to format.

The document I've linked to (from gov.uk, which in the UK means it's an official resource) documents the format of the post code which includes possible newly created post codes. It doesn't mention specific area codes (except for examples), only the format for what can and cannot be a post code.

I understand that post codes created by faker may not match up to a real post code that actually exists, but I think it should not provide a post code that can never match because the format is incorrect.

I fully agree that hardcoding current post code areas etc. is a step too far for what the library attempts to achieve, but I do believe that my request falls on the "worth doing" side of the line, but also I understand that this may add a maintenance burden, so I understand if you decide you don't agree :)

mikeybinns avatar Sep 11 '23 11:09 mikeybinns

i think you're right, however at the moment zipCode() uses faker.helpers.replaceSymbols under the hood which only supports a very basic set of substitutions: #, ? and *, so i dont think this is feasible by changing the patterns alone.

matthewmayer avatar Sep 11 '23 12:09 matthewmayer

IIRC we (I) already considered removing replaceSymbols* in favor a more standardized approach such as string replace, regex or fake patterns.

ST-DDT avatar Sep 11 '23 13:09 ST-DDT

Team Decision

  • We should replace this usage of replaceSymbols with a call to fake (and potentially then to fromRegexp)

ST-DDT avatar Apr 04 '24 15:04 ST-DDT

I'll tackle this soon.

Would you consider the switch from replaceSymbols to fake a breaking change?

ST-DDT avatar Oct 26 '24 10:10 ST-DDT

Here is a workaround, you can use until this is fixed:

const zipCode = faker.helpers.fake([
  '{{helpers.fromRegExp("[A-Z]{1,2}[0-9]{1,2} [0-9][ABDEFGHJLNPQRSTUWXYZ]{2}")}}',
  '{{helpers.fromRegExp("[A-Z]{1,2}[0-9][A-Z] [0-9][ABDEFGHJLNPQRSTUWXYZ]{2}")}}',
]);

ST-DDT avatar Oct 29 '24 21:10 ST-DDT