ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Region is validated, but language is not

Open thany opened this issue 9 months ago • 5 comments

Synopsis

When constructing a new Intl.Locale object, any language tag can be passed, but not any region tag can be passed.

Example code

Take for example a nonexisting language, in an existing region:

let locale = new Intl.Locale('xxxxx-nl');
-> Intl.Locale { baseName: "xxxxx-NL", numeric: false, language: "xxxxx", region: "NL" }

So this works by assuming perhaps a custom language. The other way round, 'xxxxx' gets ignored:

let locale = new Intl.Locale('nl-xxxxx');
-> Intl.Locale { baseName: "nl-xxxxx", numeric: false, language: "nl" }

At least ignored in the sense that 'xxxxx' is not assumed to be the region. If we go explicit, it will fail:

let locale = new Intl.Locale('nl', { region: 'xxxxx' });
-> Uncaught RangeError: invalid value "xxxxx" for option region

Takeaways

  1. Lenient parsing for language - any language is allowed.
  2. Strict parsing for region - it is restricted to a supposed internal list of valid names.
  3. An unknown region in the locale string is ignored - probably assumed to be an arbitrary suffix, not a region.

Why is this an issue?

Languages do not evolve as quickly as (political) geographical regions do. This could mean that when a new region emerges, perhaps after settling a dispute, that new region will not be accepted by any browser. An update of some kind would be required in order to have a newly formed region be valid in a locale identifier.

This also means older browsers will assume recently emerged regions to be invalid, and people living there might be offended by it.

But also, since languages evolve much more slowly than regions, it seems backwards to me that the language in a locale identifier is not validated at all. Presumably this is so that a custom or esoteric language can be specified (like Vulkan or something) but then why isn't that also the case for regions?

I'm guessing this is done because otherwise the parser can't know what part of the locale sits after the first dash. So for example in new Intl.Locale('nl-Latn') there is still no region, but a script instead. But when explicitly passing properties, like in the third example, no parsing needs to be done for the region property, and a nonexisting one can safely be assumed to be custom (or extraterrestrial). And for me, this re-raises the question why a region must adhere to a predefined list of values, and the language property is free to be anything at all.

thany avatar Feb 12 '25 11:02 thany

This might be a duplicate of #951?

eemeli avatar Feb 12 '25 12:02 eemeli

Strict parsing for region - it is restricted to a supposed internal list of valid names.

Not quite. It is restricted to being a BCP-47 region code, which is usually either {alpha}{alpha} or {num}{num}{num} (the exact grammar is in the spec).

sffc avatar Feb 12 '25 13:02 sffc

The xxxxx in new Intl.Locale('nl-xxxxx') doesn't specify a region, but a variant subtag.

anba avatar Feb 12 '25 16:02 anba

The xxxxx in new Intl.Locale('nl-xxxxx') doesn't specify a region, but a variant subtag.

I already alluded to something like that, to quote myself:

probably assumed to be an arbitrary suffix, not a region.

The issue can be reduced to a single question really:

Why is language not validated, and region is?

Not quite. It is restricted to being a BCP-47 region code, which is usually either {alpha}{alpha} or {num}{num}{num} (the exact grammar is in the spec).

Fair enough. It doesn't even matter to the discussion what exactly the region is validated against, be it a list or a pattern or otherwise. My issue is the discrepancy between validation (or the lack thereof) between language and region. My simple mind is like "whatever the reason for not validating languages is, is equally if not more applicable to regions".

Maybe there's a perfectly good reason why custom languages are allowed and custom regions aren't, but I haven't come across it.

thany avatar Feb 13 '25 12:02 thany

Language subtags are 2-3 or 5-8 alphabetic characters.

Code Result
new Intl.Locale("x").language RangeError
new Intl.Locale("xx").language "xx"
new Intl.Locale("xxx").language "xxx"
new Intl.Locale("xxxx").language RangeError
new Intl.Locale("xxxxx").language "xxxxx"
new Intl.Locale("xxxxxx").language "xxxxxx"
new Intl.Locale("xxxxxxx").language "xxxxxxx"
new Intl.Locale("xxxxxxxx").language "xxxxxxxx"
new Intl.Locale("xxxxxxxxx").language RangeError

anba avatar Feb 13 '25 12:02 anba