icu4x
icu4x copied to clipboard
Ensure that the provider performs correct alias mapping for Traditional Chinese locales
Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:
zh-Hantregardless of region.zhwithoutHansbut with any ofHK,MO,TW.yuewithout eitherHansorCN.
CC @sffc
For clarity: CLDR maps yue-CN and yue-Hans to zh-Hans, i.e. zh-u-co-pinyin.
Ensure that if a specific (and existing) collation hasn't been specified with
-u-co-, the following map tozh-u-co-stroke:
zh-Hantregardless of region.
This will be possible so long as zh-Hant contains the correct data. I'll add a test for this.
zhwithoutHansbut with any ofHK,MO,TW.
This should be automatic given that these fallbacks are included in parent locales / likely subtags; all of these locales will fall back via zh-Hant.
yuewithout eitherHansorCN.
Looks like the mappings in likely subtags are correct:
"yue": "yue-Hant-HK",
"yue-CN": "yue-Hans-CN",
"yue-Hans": "yue-Hans-CN",
I'll add a test for it.
There is a list of collation-specific aliases/parents in the LDML-to-ICU converter:
https://github.com/unicode-org/icu/blob/0266970e977b9e2488dfbf788cc280be3a0338ca/tools/cldr/cldr-to-icu/build-icu-data.xml#L263
Obviously, that list isn't making it into ICU4X.
I chatted with @markusicu about this today. He says that it may make sense to introduce a "processing" mode to the locale fallback engine. This mode can be used for both collator and break iterator.
I need to verify whether the set of ICU-specific overrides should apply uniformly to both collator data and segmenter data.
I still need to implement the actual zigzag fallback, but this can be done in the Collation fallback mode.
Upstream issue involving the ICU-specific fallback aliases: https://unicode-org.atlassian.net/browse/CLDR-16253