icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Ensure that the provider performs correct alias mapping for Traditional Chinese locales

Open hsivonen opened this issue 3 years ago • 7 comments

Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:

  • zh-Hant regardless of region.
  • zh without Hans but with any of HK, MO, TW.
  • yue without either Hans or CN.

hsivonen avatar May 30 '22 07:05 hsivonen

CC @sffc

hsivonen avatar May 30 '22 07:05 hsivonen

For clarity: CLDR maps yue-CN and yue-Hans to zh-Hans, i.e. zh-u-co-pinyin.

hsivonen avatar May 30 '22 07:05 hsivonen

Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:

  • zh-Hant regardless of region.

This will be possible so long as zh-Hant contains the correct data. I'll add a test for this.

  • zh without Hans but with any of HK, MO, TW.

This should be automatic given that these fallbacks are included in parent locales / likely subtags; all of these locales will fall back via zh-Hant.

  • yue without either Hans or CN.

Looks like the mappings in likely subtags are correct:

      "yue": "yue-Hant-HK",
      "yue-CN": "yue-Hans-CN",
      "yue-Hans": "yue-Hans-CN",

I'll add a test for it.

sffc avatar Jun 16 '22 16:06 sffc

There is a list of collation-specific aliases/parents in the LDML-to-ICU converter:

https://github.com/unicode-org/icu/blob/0266970e977b9e2488dfbf788cc280be3a0338ca/tools/cldr/cldr-to-icu/build-icu-data.xml#L263

Obviously, that list isn't making it into ICU4X.

I chatted with @markusicu about this today. He says that it may make sense to introduce a "processing" mode to the locale fallback engine. This mode can be used for both collator and break iterator.

I need to verify whether the set of ICU-specific overrides should apply uniformly to both collator data and segmenter data.

sffc avatar Aug 31 '22 02:08 sffc

I still need to implement the actual zigzag fallback, but this can be done in the Collation fallback mode.

sffc avatar Sep 26 '22 18:09 sffc

Upstream issue involving the ICU-specific fallback aliases: https://unicode-org.atlassian.net/browse/CLDR-16253

sffc avatar Dec 20 '22 00:12 sffc