icu4x
icu4x copied to clipboard
Ensure that the provider performs correct non-Chinese collation alias mapping
Ensure that the provider is performs these alias mappings from CLDR for collations (Traditional Chinese and Norwegian have more specific issues):
pa_IN: pa_Guru_IN sr_RS: sr_Cyrl_RS ars: ar_SA in_ID: id_ID iw: he in: id mo: ro sh_YU: sr_Latn_RS sh: sr_Latn sr_ME: sr_Cyrl_ME sh_BA: sr_Latn_BA sh_CS: sr_Latn_RS sr_BA: sr_Cyrl_BA iw_IL: he_IL
CC @sffc.
Norwegian: #1963. Traditional Chinese: #1964.
Looks like these aliases exist as JSON.
We'll do mappings according to likely subtags and parent locales by default:
- https://github.com/unicode-org/icu4x/blob/main/provider/testdata/data/cldr/cldr-core/supplemental/likelySubtags.json
- https://github.com/unicode-org/icu4x/blob/main/provider/testdata/data/cldr/cldr-core/supplemental/parentLocales.json
My proposed policy for legacy subtags (aliases) is that developers should call LocaleCanonicalizer::canonicalize
before passing the locale into ICU4X.
- https://github.com/unicode-org/icu4x/blob/main/provider/testdata/data/cldr/cldr-core/supplemental/aliases.json
The source for these mappings is:
https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/build-icu-data.xml#L263
#2506 lays the groundwork for this to work. I will add tests for this iteratively.
I'm using this as the main tracking issue for collation fallbacks.