ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Should "und" behave like "root" or undefined?

Open sven-oly opened this issue 1 year ago • 14 comments

This seems wrong. Apparently 'und' falls back to 'en', which is different behavior than ICU4C.

Examples:

Welcome to Node.js v18.19.1.
Type ".help" for more information.
> dt = new Intl.DateTimeFormat('und', {"month":"short","weekday":"narrow","day":"numeric","calendar":"gregory","numberingSystem":"latn"})
DateTimeFormat [Intl.DateTimeFormat] {}
> dt.format()
'T, Apr 30'

> Intl.DateTimeFormat.supportedLocalesOf(["und"])
[]
> Intl.DateTimeFormat.supportedLocalesOf(["und", "en"])
[ 'en' ]
 

sven-oly avatar Apr 30 '24 22:04 sven-oly

I thought that "und" was supported in engines, but I guess not?

CC @anba @FrankYFTang @gibson042 @eemeli

sffc avatar Apr 30 '24 23:04 sffc

und is not supported in browsers. Supporting it would probably fix some of the use cases of the Stable Formatting proposal, but not all.

eemeli avatar May 01 '24 08:05 eemeli

The reason is very simple. there are no locale resources defined for "und". See https://github.com/unicode-org/cldr/blob/main/common/main/und.xml is a 404

Also ref https://tc39.es/ecma402/#available-locales-list

FrankYFTang avatar May 01 '24 18:05 FrankYFTang

The resources for "und" are stored in root.xml in CLDR.

sffc avatar May 01 '24 19:05 sffc

In v8, internally we call

uloc_openAvailableByType(ULOC_AVAILABLE_WITH_LEGACY_ALIASES, &status);

to find out what locales are available. neither "und" nor "root" is enumerated

https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a1d61e1cb6a0d2ad60dc3cd78c931e551 said "Gets a list of available locales according to the type argument, allowing the user to access different sets of supported locales in ICU."

if "und" and "root" are not reported by ICU as "available locales", then v8 will not treat them as supported.

FrankYFTang avatar May 01 '24 19:05 FrankYFTang

sorry, I hit closed by accident.

FrankYFTang avatar May 01 '24 19:05 FrankYFTang

I made an upstream issue: https://unicode-org.atlassian.net/browse/ICU-22766

Whether or not ICU decides to start including the root locale in the return value of uloc_openAvailableByType, I think Web engines could decide to include that locale in their own lists of supported locales.

sffc avatar May 01 '24 23:05 sffc

TG2 discussion: https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-08-22.md#intldatetimeformat-does-not-support-und-locale-885

An interesting but potentially unexpected outcome of the discussion was the realization that "und" is defined by BCP-47 as simply an absent locale, so it is not semantically incorrect for ECMA-402 to have the current web reality behavior of making "und" basically an alias for undefined.

We want a way to actually get root behavior, but this might be better handled by the null locale proposal (Stable Formatting).

sffc avatar Aug 22 '24 20:08 sffc

In v8, internally we call


uloc_openAvailableByType(ULOC_AVAILABLE_WITH_LEGACY_ALIASES, &status);

to find out what locales are available. neither "und" nor "root" is enumerated

https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uloc_8h.html#a1d61e1cb6a0d2ad60dc3cd78c931e551

said

"Gets a list of available locales according to the type argument, allowing the user to access different sets of supported locales in ICU."

if "und" and "root" are not reported by ICU as "available locales", then v8 will not treat them as supported.

This is not correct. Root is structurally required. Available locales is the list to show to users. If icu docs don't make that clear it should be filed upstream.

V8 is wrong to filter on the available list and not include root. The better way would be to actually query icu for the locales actual status.

Internally root is included in the manifest for the locales. I don't remember, it's possible root is simply excluded here.

srl295 avatar Aug 24 '24 22:08 srl295

I don't think we should change the Web Reality behavior until TG2 has reached a consensus on this issue, so I don't want V8 or other engines to start doing something different with "und" in the mean time.

sffc avatar Aug 26 '24 16:08 sffc

Should we just adopt the web reality behavior as the spec behavior?

i.e., "und" and undefined are treated the same way by Intl constructors.

sffc avatar Dec 16 '24 22:12 sffc

That would probable make sense, as long as we don't forbid current locale lookup usage, as in

new Intl.Locale('und-Guru').maximize().baseName === 'pa-Guru-IN'

eemeli avatar Dec 17 '24 10:12 eemeli

I think for APIs that take natural language input as opposed to producing natural language output, it is useful to be able to request the root for most internationally applicable behavior and people get confused and think it already works when they happen to be testing with a browser configuration where the default is the root. That is, I think we should define "und" to mean the root for the collator and segmenter APIs.

hsivonen avatar Dec 19 '24 14:12 hsivonen

TG2 discussion: https://github.com/tc39/ecma402/blob/main/meetings/notes-2024-12-19.md#should-und-behave-like-root-or-undefined-885

sffc avatar Dec 21 '24 05:12 sffc