ecma402
ecma402 copied to clipboard
Region Override Support
Do you plan to support Region Override through Unicode Locale Extension:
Unicode Locale extensions seem to be managed internally.
const custom = new Intl.Locale('en-US-u-rg-gbzzzz-hc-h24');
console.log(custom.toString()); // en-US-u-rg-gbzzzz-hc-h24
The hour cycle and region override extensions are inverted from the toString method.
So, I guess the internal implementation holds all Unicode Locale Extensions but the region override is not take into account to format number, date and so on.
const locale = new Intl.Locale('en-US-u-rg-gbzzzz');
console.log(locale.toString()); // OK: en-US-u-rg-gbzzzz
console.log(locale.language); // OK: en
console.log(locale.region); // OK: US
const date = new Date(Date.UTC(2012, 11, 20, 3, 0, 0));
console.log('US: ' + new Intl.DateTimeFormat('en-US').format(date)); // OK: 12/20/2012
console.log('GB: ' + new Intl.DateTimeFormat('en-GB').format(date)); // OK: 20/12/2012
console.log('GB w/ Region Override: ' + new Intl.DateTimeFormat(locale).format(date)); // KO: 12/20/2012 -> should display 20/12/2012
According to the spec, the region is used as default for:
- currency
- calendar
- week data
- time cycle
- measurement system
- unit preferences
- number format
- currency format
- date/time format
Java 10 supports region override as expected:
import java.text.DateFormat;
import java.text.NumberFormat;
import java.time.LocalDate;
import java.time.Month;
import java.time.format.DateTimeFormatter;
import java.time.format.FormatStyle;
import java.util.Currency;
import java.util.Locale;
class Main {
public static void main(String[] args) {
Locale locale = Locale.forLanguageTag("en-US-u-rg-gbzzzz");
System.out.println(locale.getLanguage());
System.out.println(locale.getCountry());
LocalDate date = LocalDate.of(2012, Month.NOVEMBER, 20);
DateTimeFormatter dtf = DateTimeFormatter.ofLocalizedDate(FormatStyle.SHORT);
System.out.println("GB: " + date.format(dtf.localizedBy(Locale.forLanguageTag("en-US")))); // OK: 12/20/12
System.out.println("GB: " + date.format(dtf.localizedBy(Locale.forLanguageTag("en-GB")))); // OK: 20/12/2012
System.out.println("GB w/ Region Override: " + date.format(dtf.localizedBy(locale))); // OK: 20/12/2012
}
}
Related issues:
- #106
- #257
- #867 @ GlobalizeJS
@FrankYFTang Is this an ICU issue or a spec issue? Do we need to modify ECMA-402 to allow the -rg- extension, or do we just need to ensure that ICU handles it correctly?
We'd need to change the spec to respect this, as the spec includes the schema for locale data, and all supported extension keys are explicitly specified. Region override would need to be processed somehow by the spec to permit this.
We discussed this in the 2020-06-11 ECMA-402 meeting and agreed to move forward.
@sffc does this need to be covered under user preferences? I don't think so, perhaps it can be dealt with separately. Question is: PR or proposal? While the spec diff might be larger than many smaller proposals, I don't really think there's much design/decision-making to do here, just make the locale-handling respect this additional subtag, right?
Currently in ECMA402, each Intl object only listen to a restricted set of U extension specified in "[[RelevantExtensionKeys]] internal slot" and all others are stripped out while constructing the object before any matching.
https://tc39.es/ecma402/#sec-internal-slots "[[RelevantExtensionKeys]] is a List of keys of the language tag extensions defined in Unicode Technical Standard 35 that are relevant for the functionality of the constructed objects."
https://tc39.es/ecma402/#sec-intl-collator-internal-slots "10.2.3 Internal Slots The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1. The value of the [[RelevantExtensionKeys]] internal slot is a List that must include the element "co", may include any or all of the elements "kf" and "kn", and must not include any other elements."
https://tc39.es/ecma402/#sec-intl.datetimeformat-internal-slots "11.3.3 Internal slots The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1.
The value of the [[RelevantExtensionKeys]] internal slot is « "ca", "hc", "nu" »."
https://tc39.es/ecma402/#sec-Intl.DisplayNames-internal-slots "12.3.3 Internal slots The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1.
The value of the [[RelevantExtensionKeys]] internal slot is « »."
https://tc39.es/ecma402/#sec-Intl.ListFormat-internal-slots "13.3.3 Internal slots The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1.
The value of the [[RelevantExtensionKeys]] internal slot is « »."
https://tc39.es/ecma402/#sec-intl.numberformat-internal-slots "15.3.3 Internal slots The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1.
The value of the [[RelevantExtensionKeys]] internal slot is « "nu" »."
https://tc39.es/ecma402/#sec-intl.pluralrules-internal-slots "16.3.3 Internal slots The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1.
The value of the [[RelevantExtensionKeys]] internal slot is « »."
So at minimum ECMA402 need to be changed to put "rg" into the RelevantExtensionKeys for it to be considered in the spec if we like to support Region Override.
Discussion with @FrankYFTang @ben-allen @sffc: in order to move forward with this, the implementation (ICU) needs to fully support the -u-rg subtag, and this is challenged by the fact that there is no clear list of what items fall into the "dialect region" versus the "extension region" bucket. For example:
- Spelling and pluralization rules: clearly the dialect region
- Measurement unit preferences: clearly the extension region
- Grouping separators, datetime patterns: unclear
This is partly tracked upstream in: https://unicode-org.atlassian.net/browse/CLDR-15265
Item impacted by rg is listed in https://github.com/unicode-org/cldr/blob/main/common/supplemental/rgScope.xml
And we should analysis which Intl object should be impacted by the -u-rg-
I think we should wait until Intl Locale Info lands, and then we should put together this proposal (which should be small to medium in size). ICU4X can make this easier to implement. ICU4C should already implement -u-rg for certain key resources like unit preferences.