icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Ensure that the provider performs correct non-Chinese collation alias mapping

Open hsivonen opened this issue 2 years ago • 4 comments

Ensure that the provider is performs these alias mappings from CLDR for collations (Traditional Chinese and Norwegian have more specific issues):

pa_IN: pa_Guru_IN sr_RS: sr_Cyrl_RS ars: ar_SA in_ID: id_ID iw: he in: id mo: ro sh_YU: sr_Latn_RS sh: sr_Latn sr_ME: sr_Cyrl_ME sh_BA: sr_Latn_BA sh_CS: sr_Latn_RS sr_BA: sr_Cyrl_BA iw_IL: he_IL

hsivonen avatar May 30 '22 07:05 hsivonen

CC @sffc.

Norwegian: #1963. Traditional Chinese: #1964.

hsivonen avatar May 30 '22 07:05 hsivonen

Looks like these aliases exist as JSON.

hsivonen avatar Jun 16 '22 14:06 hsivonen

We'll do mappings according to likely subtags and parent locales by default:

  • https://github.com/unicode-org/icu4x/blob/main/provider/testdata/data/cldr/cldr-core/supplemental/likelySubtags.json
  • https://github.com/unicode-org/icu4x/blob/main/provider/testdata/data/cldr/cldr-core/supplemental/parentLocales.json

My proposed policy for legacy subtags (aliases) is that developers should call LocaleCanonicalizer::canonicalize before passing the locale into ICU4X.

  • https://github.com/unicode-org/icu4x/blob/main/provider/testdata/data/cldr/cldr-core/supplemental/aliases.json

sffc avatar Jun 16 '22 16:06 sffc

The source for these mappings is:

https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/build-icu-data.xml#L263

#2506 lays the groundwork for this to work. I will add tests for this iteratively.

sffc avatar Sep 16 '22 01:09 sffc

I'm using this as the main tracking issue for collation fallbacks.

sffc avatar Sep 26 '22 18:09 sffc