ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Add support for more subtags

Open ryzokuken opened this issue 4 years ago • 10 comments

Currently, a number of Unicode extension keywords are supported, including nu for numbering systems, ca for calendaring systems, hc for hour cycles, etc.

This issue tracks all the BCP-47 subtags added by Unicode that we do not yet support. Feel free to add to the list below this:

  1. rg (Region override): see #370.
  2. fw (First day of the week): see #6.

ryzokuken avatar Jun 08 '21 12:06 ryzokuken

I think we need to have a high level guideline about what should be considered and what should not be.

Currently, ECMA402 is very specific of the following are NOT allowed

and the keys "kb", "kh", "kk", "kr", and "vt" are not allowed in this version of the Internationalization API. "the keys "kb", "kh", "kk", "kr", and "vt" are not allowed in this version of the Internationalization API. "

FrankYFTang avatar Jun 09 '21 21:06 FrankYFTang

@FrankYFTang absolutely! Should we put this on the agenda for the next monthly meeting? I was alluding to the same thing in my review to #581.

ryzokuken avatar Jun 09 '21 22:06 ryzokuken

also, another two are "cf" and "ss" I do not believe "cf" should be specified in extension because "cu" is already not. https://tc39.es/ecma402/#sec-intl.numberformat-internal-slots "Unicode Technical Standard 35 describes two locale extension keys that are relevant to number formatting: "cu" for currency and "nu" for numbering system. Intl.NumberFormat, however, requires that the currency of a currency format is specified through the currency property in the options objects."

so I filed PR for adding "cf" in the same way as "cu" in https://github.com/tc39/ecma402/pull/581

Another is "ss" I filed https://github.com/tc39/proposal-intl-segmenter/issues/142 I think that is a "feature request" for Intl.Segmenter (but I rather we deal with that post Stage 4 merge into ECMA402) since there are no way to control that in options right now . ( @gibson042 ) ICU did that not via an API but via locale too - (see case UBRK_SENTENCE of BreakIterator::makeInstance in common/brkiter.cpp and comments in common/unicode/ubrk.h ) so maybe we should also accept "ss" in extension after we add an option to support that post merging Stage 4 of Intl.Segmenter to ECMA402. I think it is not a good idea to change the current spec of Intl.Segmenter. Maybe a Intl.Segmenter v2 to improve it.

FrankYFTang avatar Jun 09 '21 22:06 FrankYFTang

@FrankYFTang agreed on the segmenter. I generally feel that we should prefer subtags over options (since they can be best included in Intl.Locale and passed around in the BCP 47 string) but we can talk about that in greater detail.

ryzokuken avatar Jun 09 '21 22:06 ryzokuken

Is that a dupe of #105 ?

zbraniecki avatar Oct 08 '22 05:10 zbraniecki

I think #105 is about filling in the set between the locale and the options bags, whereas this issue is about adding support for new ones.

sffc avatar Oct 08 '22 15:10 sffc

TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2022-12-08.md#add-support-for-more-subtags-580

Conclusion: Move forward with adding more locale keywords. Add any that are in UTS 35 that impact ECMA-402 formatters, no more, no less.

sffc avatar Dec 08 '22 21:12 sffc

Looking over the table in UTS 35:

https://www.unicode.org/reports/tr35/tr35.html#Key_Type_Definitions

It seems that there are 3 keywords that might be relevant (besides rg and sd):

  • -u-fw but only for the Intl Locale Info proposal; see https://github.com/tc39/proposal-intl-locale-info/issues/68
  • -u-dx but the use case is very unclear
  • -u-ss which affects sentence segmentation

Therefore, it seems that -u-ss is the only other subtag that we need to add here. The API option could be called sentenceBreakSuppressions.

sffc avatar May 02 '23 21:05 sffc

It is not clear to me why is rg difficult to support.

FrankYFTang avatar Aug 28 '23 23:08 FrankYFTang

This is blocked on -u-ss and -u-dx being implemented in both ICU4X and ICU4C.

sffc avatar Sep 18 '23 21:09 sffc