ecma402
ecma402 copied to clipboard
Should "und" behave like "root" or undefined?
This seems wrong. Apparently 'und' falls back to 'en', which is different behavior than ICU4C.
Examples:
Welcome to Node.js v18.19.1.
Type ".help" for more information.
> dt = new Intl.DateTimeFormat('und', {"month":"short","weekday":"narrow","day":"numeric","calendar":"gregory","numberingSystem":"latn"})
DateTimeFormat [Intl.DateTimeFormat] {}
> dt.format()
'T, Apr 30'
> Intl.DateTimeFormat.supportedLocalesOf(["und"])
[]
> Intl.DateTimeFormat.supportedLocalesOf(["und", "en"])
[ 'en' ]
I thought that "und" was supported in engines, but I guess not?
CC @anba @FrankYFTang @gibson042 @eemeli
und is not supported in browsers. Supporting it would probably fix some of the use cases of the Stable Formatting proposal, but not all.
The reason is very simple. there are no locale resources defined for "und". See https://github.com/unicode-org/cldr/blob/main/common/main/und.xml is a 404
Also ref https://tc39.es/ecma402/#available-locales-list
The resources for "und" are stored in root.xml in CLDR.
In v8, internally we call
uloc_openAvailableByType(ULOC_AVAILABLE_WITH_LEGACY_ALIASES, &status);
to find out what locales are available. neither "und" nor "root" is enumerated
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a1d61e1cb6a0d2ad60dc3cd78c931e551 said "Gets a list of available locales according to the type argument, allowing the user to access different sets of supported locales in ICU."
if "und" and "root" are not reported by ICU as "available locales", then v8 will not treat them as supported.
sorry, I hit closed by accident.
I made an upstream issue: https://unicode-org.atlassian.net/browse/ICU-22766
Whether or not ICU decides to start including the root locale in the return value of uloc_openAvailableByType, I think Web engines could decide to include that locale in their own lists of supported locales.
TG2 discussion: https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-08-22.md#intldatetimeformat-does-not-support-und-locale-885
An interesting but potentially unexpected outcome of the discussion was the realization that "und" is defined by BCP-47 as simply an absent locale, so it is not semantically incorrect for ECMA-402 to have the current web reality behavior of making "und" basically an alias for undefined.
We want a way to actually get root behavior, but this might be better handled by the null locale proposal (Stable Formatting).
In v8, internally we call
uloc_openAvailableByType(ULOC_AVAILABLE_WITH_LEGACY_ALIASES, &status);to find out what locales are available. neither "und" nor "root" is enumerated
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a1d61e1cb6a0d2ad60dc3cd78c931e551
said
"Gets a list of available locales according to the type argument, allowing the user to access different sets of supported locales in ICU."
if "und" and "root" are not reported by ICU as "available locales", then v8 will not treat them as supported.
This is not correct. Root is structurally required. Available locales is the list to show to users. If icu docs don't make that clear it should be filed upstream.
V8 is wrong to filter on the available list and not include root. The better way would be to actually query icu for the locales actual status.
Internally root is included in the manifest for the locales. I don't remember, it's possible root is simply excluded here.
I don't think we should change the Web Reality behavior until TG2 has reached a consensus on this issue, so I don't want V8 or other engines to start doing something different with "und" in the mean time.
Should we just adopt the web reality behavior as the spec behavior?
i.e., "und" and undefined are treated the same way by Intl constructors.
That would probable make sense, as long as we don't forbid current locale lookup usage, as in
new Intl.Locale('und-Guru').maximize().baseName === 'pa-Guru-IN'
I think for APIs that take natural language input as opposed to producing natural language output, it is useful to be able to request the root for most internationally applicable behavior and people get confused and think it already works when they happen to be testing with a browser configuration where the default is the root. That is, I think we should define "und" to mean the root for the collator and segmenter APIs.
TG2 discussion: https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-12-19.md#should-und-behave-like-root-or-undefined-885