ecma402
                                
                                
                                
                                    ecma402 copied to clipboard
                            
                            
                            
                        Add support for more subtags
Currently, a number of Unicode extension keywords are supported, including nu for numbering systems, ca for calendaring systems, hc for hour cycles, etc.
This issue tracks all the BCP-47 subtags added by Unicode that we do not yet support. Feel free to add to the list below this:
rg(Region override): see #370.fw(First day of the week): see #6.
I think we need to have a high level guideline about what should be considered and what should not be.
Currently, ECMA402 is very specific of the following are NOT allowed
and the keys "kb", "kh", "kk", "kr", and "vt" are not allowed in this version of the Internationalization API. "the keys "kb", "kh", "kk", "kr", and "vt" are not allowed in this version of the Internationalization API. "
@FrankYFTang absolutely! Should we put this on the agenda for the next monthly meeting? I was alluding to the same thing in my review to #581.
also, another two are "cf" and "ss" I do not believe "cf" should be specified in extension because "cu" is already not. https://tc39.es/ecma402/#sec-intl.numberformat-internal-slots "Unicode Technical Standard 35 describes two locale extension keys that are relevant to number formatting: "cu" for currency and "nu" for numbering system. Intl.NumberFormat, however, requires that the currency of a currency format is specified through the currency property in the options objects."
so I filed PR for adding "cf" in the same way as "cu" in https://github.com/tc39/ecma402/pull/581
Another is "ss" I filed https://github.com/tc39/proposal-intl-segmenter/issues/142 I think that is a "feature request" for Intl.Segmenter (but I rather we deal with that post Stage 4 merge into ECMA402) since there are no way to control that in options right now . ( @gibson042 ) ICU did that not via an API but via locale too - (see case UBRK_SENTENCE of BreakIterator::makeInstance in common/brkiter.cpp and comments in common/unicode/ubrk.h ) so maybe we should also accept "ss" in extension after we add an option to support that post merging Stage 4 of Intl.Segmenter to ECMA402. I think it is not a good idea to change the current spec of Intl.Segmenter. Maybe a Intl.Segmenter v2 to improve it.
@FrankYFTang agreed on the segmenter. I generally feel that we should prefer subtags over options (since they can be best included in Intl.Locale and passed around in the BCP 47 string) but we can talk about that in greater detail.
Is that a dupe of #105 ?
I think #105 is about filling in the set between the locale and the options bags, whereas this issue is about adding support for new ones.
TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2022-12-08.md#add-support-for-more-subtags-580
Conclusion: Move forward with adding more locale keywords. Add any that are in UTS 35 that impact ECMA-402 formatters, no more, no less.
Looking over the table in UTS 35:
https://www.unicode.org/reports/tr35/tr35.html#Key_Type_Definitions
It seems that there are 3 keywords that might be relevant (besides rg and sd):
-u-fwbut only for the Intl Locale Info proposal; see https://github.com/tc39/proposal-intl-locale-info/issues/68-u-dxbut the use case is very unclear-u-sswhich affects sentence segmentation
Therefore, it seems that -u-ss is the only other subtag that we need to add here. The API option could be called sentenceBreakSuppressions.
It is not clear to me why is rg difficult to support.
This is blocked on -u-ss and -u-dx being implemented in both ICU4X and ICU4C.