ecma402
ecma402 copied to clipboard
Normative: Update String toLocale{Lower,Upper}Case to ResolveLocale with best-fit matching
Fixes #896
- Do not ignore locales after the first in the list returned by CanonicalizeLocaleList(locales) (observable via e.g.
"I".toLocaleLowerCase(["zzz", "tr"]) === "ı"). - Match against the items of that list by best-fit rather than prefix, aligning with the rest of ECMA-402 (although the difference is not necessarily observable).
- When the list is not empty but no matching locale is found, default to DefaultLocale() rather than "und", aligning with empty-list behavior and with the rest of ECMA-402 (observable via e.g.
"I".toLocaleLowerCase("zzz") === "I".toLocaleLowerCase()regardless of default locale, just likenew Intl.Collator("zzz", { sensitivity: "variant" }).compare("i", "ı") === new Intl.Collator(undefined, { sensitivity: "variant" }).compare("i", "ı")).
- Do not ignore locales after the first in the list returned by CanonicalizeLocaleList(locales) (observable via e.g.
"I".toLocaleLowerCase(["zzz", "tr"]) === "ı").
But also "I".toLocaleLowerCase(["en", "tr"]) === "ı", because "en" will generally don't have locale-sensitive case mappings, which means the next locale in the list gets selected.
- Match against the items of that list by best-fit rather than prefix, aligning with the rest of ECMA-402 (although the difference is not necessarily observable).
"best fit" matching isn't supported in browsers. (V8 has "--harmony-intl-best-fit-matcher", but that's not available by default.)
- When the list is not empty but no matching locale is found, default to DefaultLocale() rather than "und", aligning with empty-list behavior and with the rest of ECMA-402 [...]
That means "I".toLocaleLowerCase("und") can now return either "i" or "ı", depending on the user-locale.
- Do not ignore locales after the first in the list returned by CanonicalizeLocaleList(locales) (observable via e.g.
"I".toLocaleLowerCase(["zzz", "tr"]) === "ı").But also
"I".toLocaleLowerCase(["en", "tr"]) === "ı", because "en" will generally don't have locale-sensitive case mappings, which means the next locale in the list gets selected.
Ah yeah, I guess the Available Locales List needs to include more than just locale identifiers with language-sensitive case mappings. Any thoughts on what it should be? Mayble %Intl.Collator%.[[SortLocaleData]]?
- Match against the items of that list by best-fit rather than prefix, aligning with the rest of ECMA-402 (although the difference is not necessarily observable).
"best fit" matching isn't supported in browsers. (V8 has "--harmony-intl-best-fit-matcher", but that's not available by default.)
That's irrelevant, because LookupMatchingLocaleByBestFit is defined to produce results "at least as good as those produced by the LookupMatchingLocaleByPrefix algorithm" (and therefore any implementation is free to just reuse LookupMatchingLocaleByPrefix).
- When the list is not empty but no matching locale is found, default to DefaultLocale() rather than "und", aligning with empty-list behavior and with the rest of ECMA-402 [...]
That means
"I".toLocaleLowerCase("und")can now return either"i"or"ı", depending on the user-locale.
I believe that would also be addressed via the Available Locales List provided to ResolveLocale, but regardless should align with other Intl services in general and Intl.Collator in particular. Probably, "und" should just always be considered available, but definitely should not be given special treatment exclusively in TransformCase.
Ah yeah, I guess the Available Locales List needs to include more than just locale identifiers with language-sensitive case mappings. Any thoughts on what it should be? Mayble %Intl.Collator%.[[SortLocaleData]]?
If I had to guess, I'd say one of the reasons locale case conversion works differently from the other APIs, is that it's difficult to find an appropriate Available Locales list. I'm not sure if Intl.Collator is a good fit.
That's irrelevant, because LookupMatchingLocaleByBestFit is defined to produce results "at least as good as those produced by the LookupMatchingLocaleByPrefix algorithm" (and therefore any implementation is free to just reuse LookupMatchingLocaleByPrefix).
I had assumed "the difference is not necessarily observable" was in reference to actual browser behaviour. If we assume an implementation that supports "best fit", which most likely uses the data from https://github.com/unicode-org/cldr/blob/main/common/supplemental/languageInfo.xml, then it's possible to have observable differences. There are three relevant entries:
<languageMatch desired="ku" supported="tr" distance="30" oneway="true"/>
<languageMatch desired="azb" supported="az" distance="10" oneway="true"/>
<languageMatch desired="az" supported="ru" distance="30" oneway="true"/>
That means "I".toLocaleLowerCase("ku") may fallback to "I".toLocaleLowerCase("tr"), because there's the fallback "ku" → "tr". For example V8 doesn't ship locale data for Kurdish (Intl.Collator.supportedLocalesOf("ku") returns the empty array), so if V8 started to officially support the "best fit" matcher, but string conversion is tied to the Intl.Collator Availables Locales, then "I".toLocaleLowerCase("ku") could start to return the dot-less i (U+0131).
I believe that would also be addressed via the Available Locales List provided to ResolveLocale, but regardless should align with other Intl services in general and Intl.Collator in particular. Probably, "und" should just always be considered available, but definitely should not be given special treatment exclusively in TransformCase.
I gave "und" as a special case, because at least for programmers with a Java background, using "und" shouldn't be too uncommon. (Java's String case conversion methods use java.util.Locale.getDefault() by default, which can result in bugs when the default locale is Turkish/Azeri. Instead it's necessary to use str.toLowerCase(Locale.ROOT).)
Updated per TG2 discussion.
TG2 discussion on 2025-02-06: https://github.com/tc39/ecma402/blob/main/meetings/notes-2025-02-06.md#normative-update-string-tolocaleloweruppercase-to-resolvelocale-with-best-fit-matching-956
Moving to draft because the necessary list of "case-mappable languages" is not conveniently available in implementations.
TG2 discussion in May 2025: https://github.com/tc39/ecma402/blob/main/meetings/notes-2025-05-08.md#normative-update-string-tolocaleloweruppercase-to-resolvelocale-with-best-fit-matching-956
There is a pre-existing CLDR issue to add data for which languages should use Turkic case folding:
https://unicode-org.atlassian.net/browse/CLDR-16202