faker icon indicating copy to clipboard operation
faker copied to clipboard

Improve locale loading and global locale data

Open ST-DDT opened this issue 2 years ago • 7 comments

Clear and concise description of the problem

The current system of loading locale data has some drawbacks, that needs to be addresses:

  1. It heavily depends on the en locale, even though some of the data (emojis, iban, country codes) aren't related to that
  2. Is is not possible to base region/country specific locales, with more than a single fallback (which is en by default)
  3. Using custom locale entries is currently not supported (via faker.definitions)

Suggested solution

  1. Introduce a global locale with all the locale data that aren't bound to a specific locale
  2. Replace locale and fallback locale with an locale[], that are searched in order #858
  3. Provide all locale data dynamically via (via faker.definitions) #822

Alternative

  1. Remove these data entirely from the locales, making it impossible to change/filter them.
  2. Require the users to manually build their locales, but this is the exact opposite of our existing faker.definitions
  3. Let them access the locale data via the locale, which once again is the exact opposite of our existing faker.definitions

Additional context

  1. could be introduced in v6.x, while keeping the data in en till v7 (No 2 is implement).
  2. Is definitely a breaking change and can only be in v7+
  3. Can be added now, as it is a non-breaking improvement

ST-DDT avatar Apr 09 '22 14:04 ST-DDT

I already have an outdated branch with a potential solution/implementation for 2: https://github.com/faker-js/faker/compare/main...ST-DDT:feature/locales/multi-fallback

ST-DDT avatar Apr 09 '22 14:04 ST-DDT

We should definitely have a space where we could put all locale-independent definitions. Not sure we should call it global locale though as it's not connected with anything i18n. I would rather see two folders here:

  • definitions (or data)
  • i18n (or locales)

Being able to create a path of multilevel fallbacks sounds like a good idea too! I can at least see 3 levels: dialect (eg de-AT), language (de), default (always en). But we should let user construct it they way he wants, including dynamic set created on his project side:

faker.locale = ['de-AT', {
  // custom user locale as Partial<FakerLocale>
}

We should also define default fallbacks so en-GB should always fallback into en. This way it could have a small subset of data specific to British English. What I mean here by fallback is that when user specifies en-GB as his language it should be actually a deep merge of definitions from en-GB with en. This way GB can be a super small file with only very specific things on British. Does this make sense?

pkuczynski avatar Apr 10 '22 15:04 pkuczynski

We should definitely have a space where we could put all locale-independent definitions.

I specifically used a locale for these data to allow for easy and consistent overwriting of the data. E.g. limit the ibans/country codes to only specific set of countries.

The locale order will be like this by default: de-At, de, en, global The user may create their own faker instances using any combination of locales they want or dont want. They are explicitly able to omit global from their fallbacks. They are also able to add zero or more custom locales. These custom locales work exactly like normal locales.

The localeOrder is indepent of the available locales in a faker instance. The locale order uses the locale names/keys, not the locale objects.

ST-DDT avatar Apr 10 '22 16:04 ST-DDT

What I mean here by fallback is that when user specifies en-GB as his language it should be actually a deep merge of definitions from en-GB with en. This way GB can be a super small file with only very specific things on British. Does this make sense?

We would use the same fallback mechanism as now, just with more fallback levels. If it is not in en-GB then we will check en, then global. We virtually merge the locale modules, but not their entries.

You can overwrite in en-GB only en.finance.credit_card. Not statically only en.finance.credit_card.visa. You may clone and filter/adjust the subsections though. E.g. edit visa by overwriting credit_card with an adjusted copy. If you want to do more than that, we can discuss that in a seperate discussion.

ST-DDT avatar Apr 10 '22 16:04 ST-DDT

We should definitely have a space where we could put all locale-independent definitions. Not sure we should call it global locale though as it's not connected with anything i18n. I would rather see two folders here:

  • definitions (or data)
  • i18n (or locales)

Being able to create a path of multilevel fallbacks sounds like a good idea too! I can at least see 3 levels: dialect (eg de-AT), language (de), default (always en). But we should let user construct it they way he wants, including dynamic set created on his project side:

faker.locale = ['de-AT', {
  // custom user locale as Partial<FakerLocale>
}

We should also define default fallbacks so en-GB should always fallback into en. This way it could have a small subset of data specific to British English. What I mean here by fallback is that when user specifies en-GB as his language it should be actually a deep merge of definitions from en-GB with en. This way GB can be a super small file with only very specific things on British. Does this make sense?

Just some suggestions for the local-independent definitions name: nolocale, independent, or something like that.

import-brain avatar Apr 10 '22 18:04 import-brain

I specifically used a locale for these data to allow for easy and consistent overwriting of the data. E.g. limit the ibans/country codes to only specific set of countries.

This is not a good example for IBAN, as for example I might be on a polish locale, yet willing to generate Italian IBAN... IBAN imho is not connected with locale. Country of origin might a param for it, but otherwise format is the same for all countries, right?

I really don't think global as you call it should be part of locales. It has nothing to do with locales... Good example was recent discussion around colors. IBAN is another good example. Credit card number or credit card organisation too...

I really see no reason why anyone would like to exclude global as you call it or why someone might want to override it? Makes no sense. We should follow what makes most sense from architecture point of view and not abstract ideas without practical application.

pkuczynski avatar Apr 22 '22 22:04 pkuczynski

You have a point.

What about mime file types? I assume you will hardly ever want most of them. If they are (global) locale data, you can easily filter and overwrite them.

ST-DDT avatar Apr 22 '22 23:04 ST-DDT

Superseded by #1340.

ST-DDT avatar Sep 08 '22 16:09 ST-DDT