ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

strip out u-extension in the locale returned by resolvedOptions()

Open FrankYFTang opened this issue 6 years ago • 3 comments

Currently the locale returned by resolvedOptions() of Intl objects has strange information. Often time the information on the u-extension of the locale is redundant to other property returned by the resolvedOptions(). For example, if we create a Intl.Collator with the following 6 conditions A. "en" B. "en", { caseFirst: "upper"} C. "en-u-kn-upper", { caseFirst: "upper"} D. "en-u-kn-lower", { caseFirst: "upper"} . <= notice a conflict between the locale and option E. "en-u-kn-upper", F. "en-u-kn-lower",

the locale , caseFirst of resolvedOptions, by the current shape of the spec, will be A. "en", undefined B. "en", "upper" C. "en-u-kn-upper", "upper" D. "en", "upper" . <= notice we already strip the "-u-kn-lower" out when we have conflict. E. "en-u-kn-upper", "upper" F. "en-u-kn-lower", "lower"

In v8, we found it is hard to implement "EFFICIENTLY" this behavior for resolvedOptions() without increase of memory because it is hard read information from ICU to figure out the distinguishing between them. I would suggest we report locale without the u-extension since these information is already covered by the property. So I would like to see our spec change to make the output became

A. "en", undefined B. "en", "upper" C. "en", "upper" D. "en", "upper" E. "en", "upper" F. "en", "lower" instead, the developer won't miss any important information and it is. If we have concern of blanking remove all u-extension from the reported locale could be harmful, we could just remove those key in the RelativeExtensionKeys only since those will always have another property in the return object of resolvedOptions()

@gsathya @littledan @zbraniecki @anba

FrankYFTang avatar Jan 12 '19 19:01 FrankYFTang

We discussed this in the ECMA-402 meeting. It seems like stripping these things out only makes sense once all tags have corresponding options, which we don't have at the moment. We had some reasons to not represent certain options (e.g., currency, timezone) as tags, but I don't think we had any strong objections to build this in the other direction, though. @NorbertLindenberg contacted me recently to explain that he's no longer involved in TC39, so I don't think we can get more information from discussing it with him.

littledan avatar Jan 18 '19 12:01 littledan

We discussed this in the ECMA-402 meeting. It seems like stripping these things out only makes sense once all tags have corresponding options, which we don't have at the moment. We had some reasons to not represent certain options (e.g., currency, timezone) as tags, but I don't think we had any strong objections to build this in the other direction, though. @NorbertLindenberg contacted me recently to explain that he's no longer involved in TC39, so I don't think we can get more information from discussing it with him.

I do not understand this part. First of all, let's me put down what I understand what is the current spec here first.

A. For all the spec, the initialization function will set a set of key "relevantExtensionKeys" for example, for collator in https://tc39.github.io/ecma402/#sec-initializecollator step

  1. Let relevantExtensionKeys be %Collator%.[[RelevantExtensionKeys]].

And according to https://tc39.github.io/ecma402/#sec-intl-collator-internal-slots

The value of the [[RelevantExtensionKeys]] internal slot is a List that must include the element "co", may include any or all of the elements "kn" and "kf", and must not include any other elements."

Notice, "ka", "kb", "kc", "kh", "kk", "kr", "ks", "vt" are NOT in %Collator%.[[RelevantExtensionKeys]] because of "must not include any other elements." B. then all the object will call ResolveLocale with the relevantExtensionKeys. for example, in the case of Collator, it is

  1. Let r be ResolveLocale(%Collator%.[[AvailableLocales]], requestedLocales, opt, relevantExtensionKeys, localeData)."

B. Inside the ResolveLocale operation, in step 2 - 3, we got a foundLocale from either LookupMatcher or BestFitMatcher

  1. If matcher is "lookup", then a. Let r be LookupMatcher(availableLocales, requestedLocales).
  2. Else, a . Let r be BestFitMatcher(availableLocales, requestedLocales).
  3. Let foundLocale be r.[[locale]].

and either way foundLocale is a locale with all u-extension removed at this point because

B-1 Inside LookupMatcher

2.a Let noExtensionsLocale be the String value that is locale with all Unicode locale extension sequences removed.

and B-2 BestFitMatcher

Options specified through Unicode locale extension sequences must be ignored by the algorithm.

C. then in step 8 "For each element key of relevantExtensionKeys in List order, do" will go over all the relevantExtensionKeys and produce a supportedExtension. And such operation will remove all key/value of -u- extension NOT listed in relevantExtensionKeys.

D. Then in step 9 of ResolveLocale, it will create result.[[locale]] based on foundLocale and supportedExtension. Therefore, the result.[[locale]] could only contains -u- extension specified in relevantExtensionKeys. In other words, all the -u-extension which is not listed in relevantExtensionKeys will be removed before return by ResolveLocale.

E. and then the init function of each object it will store the r.[[Locale]] of the return result of ResolveLocale to the object. For example, for the case of Collator

  1. Set collator.[[Locale]] to r.[[locale]].

F. resolvedOptions will later just output the object.[[Locale]] so... any -u- extensions which is not specified in the internal slot section of each object in the spec, in the current shape of ECMA402, will not be included in this locale of resolvedOptions() already.

Here is the list of relevantExtensionKeys from ECMA 402 and all current proposals: Collator: << "co" >> , and possible also << "co, "kn", "kf" >> ref https://tc39.github.io/ecma402/#sec-intl-collator-internal-slots

NumberFormat: << "nu" >> ref https://tc39.github.io/ecma402/#sec-intl.numberformat-internal-slots

DateTimeFormat: <<"ca", "nu", "hc">> ref https://tc39.github.io/ecma402/#sec-intl.datetimeformat-internal-slots

PluralRules: << >> (empty) https://tc39.github.io/ecma402/#sec-intl.pluralrules-internal-slots

RelativeTimeFormat: << "nu" >> https://tc39.github.io/proposal-intl-relative-time/#sec-Intl.RelativeTimeFormat-internal-slots

ListFormat: << >> (empty) https://tc39.github.io/proposal-intl-list-format/#sec-Intl.ListFormat-internal-slots

Segmenter: << "lb" >> [but will be change to << >> once the remove line break PR merged in] https://tc39.github.io/proposal-intl-segmenter/#sec-Intl.Segmenter-internal-slots

Unified NumberFormat - NO CHANGE to NumberFormat about this https://tc39.github.io/proposal-unified-intl-numberformat/section11/numberformat_proposed_out.html#sec-intl.numberformat-internal-slots

Date formatRange/formatRangeToParts - NO CHANGE to DateTimeFormat about this https://rawgit.com/fabalbon/proposal-intl-DateTimeFormat-formatRange/master/out/#sec-intl.datetimeformat-internal-slots

DateStyle - NO CHANGE to DateTimeFormat about this https://tc39.github.io/proposal-intl-datetime-style/#sec-intl.datetimeformat-internal-slots

Could we first agree upon about this is what happen AS IS in the current spec. So then we can discuss my suggestion?

If you think my description of what CURRENTLY IS (regardless what it WAS and how I propose to change it to) is not accurate, please first correct me about that here. so once we agree upon the understanding of the CURRENT spec, we can see what will be the DIFFERENCE if my suggestion got taken.

FrankYFTang avatar Jan 18 '19 19:01 FrankYFTang

@FrankYFTang What additional feedback do you need in order to make progress on this issue?

sffc avatar Jun 05 '20 20:06 sffc