ecma402
ecma402 copied to clipboard
Region is validated, but language is not
Synopsis
When constructing a new Intl.Locale object, any language tag can be passed, but not any region tag can be passed.
Example code
Take for example a nonexisting language, in an existing region:
let locale = new Intl.Locale('xxxxx-nl');
-> Intl.Locale { baseName: "xxxxx-NL", numeric: false, language: "xxxxx", region: "NL" }
So this works by assuming perhaps a custom language. The other way round, 'xxxxx' gets ignored:
let locale = new Intl.Locale('nl-xxxxx');
-> Intl.Locale { baseName: "nl-xxxxx", numeric: false, language: "nl" }
At least ignored in the sense that 'xxxxx' is not assumed to be the region. If we go explicit, it will fail:
let locale = new Intl.Locale('nl', { region: 'xxxxx' });
-> Uncaught RangeError: invalid value "xxxxx" for option region
Takeaways
- Lenient parsing for language - any language is allowed.
- Strict parsing for region - it is restricted to a supposed internal list of valid names.
- An unknown region in the locale string is ignored - probably assumed to be an arbitrary suffix, not a region.
Why is this an issue?
Languages do not evolve as quickly as (political) geographical regions do. This could mean that when a new region emerges, perhaps after settling a dispute, that new region will not be accepted by any browser. An update of some kind would be required in order to have a newly formed region be valid in a locale identifier.
This also means older browsers will assume recently emerged regions to be invalid, and people living there might be offended by it.
But also, since languages evolve much more slowly than regions, it seems backwards to me that the language in a locale identifier is not validated at all. Presumably this is so that a custom or esoteric language can be specified (like Vulkan or something) but then why isn't that also the case for regions?
I'm guessing this is done because otherwise the parser can't know what part of the locale sits after the first dash. So for example in new Intl.Locale('nl-Latn') there is still no region, but a script instead. But when explicitly passing properties, like in the third example, no parsing needs to be done for the region property, and a nonexisting one can safely be assumed to be custom (or extraterrestrial). And for me, this re-raises the question why a region must adhere to a predefined list of values, and the language property is free to be anything at all.
This might be a duplicate of #951?
Strict parsing for region - it is restricted to a supposed internal list of valid names.
Not quite. It is restricted to being a BCP-47 region code, which is usually either {alpha}{alpha} or {num}{num}{num} (the exact grammar is in the spec).
The xxxxx in new Intl.Locale('nl-xxxxx') doesn't specify a region, but a variant subtag.
The
xxxxxinnew Intl.Locale('nl-xxxxx')doesn't specify a region, but a variant subtag.
I already alluded to something like that, to quote myself:
probably assumed to be an arbitrary suffix, not a region.
The issue can be reduced to a single question really:
Why is language not validated, and region is?
Not quite. It is restricted to being a BCP-47 region code, which is usually either
{alpha}{alpha}or{num}{num}{num}(the exact grammar is in the spec).
Fair enough. It doesn't even matter to the discussion what exactly the region is validated against, be it a list or a pattern or otherwise. My issue is the discrepancy between validation (or the lack thereof) between language and region. My simple mind is like "whatever the reason for not validating languages is, is equally if not more applicable to regions".
Maybe there's a perfectly good reason why custom languages are allowed and custom regions aren't, but I haven't come across it.
Language subtags are 2-3 or 5-8 alphabetic characters.
| Code | Result |
|---|---|
new Intl.Locale("x").language |
RangeError |
new Intl.Locale("xx").language |
"xx" |
new Intl.Locale("xxx").language |
"xxx" |
new Intl.Locale("xxxx").language |
RangeError |
new Intl.Locale("xxxxx").language |
"xxxxx" |
new Intl.Locale("xxxxxx").language |
"xxxxxx" |
new Intl.Locale("xxxxxxx").language |
"xxxxxxx" |
new Intl.Locale("xxxxxxxx").language |
"xxxxxxxx" |
new Intl.Locale("xxxxxxxxx").language |
RangeError |