icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Update locale canonicalization to use bcp47 alias data

Open dminor opened this issue 3 years ago • 5 comments

In #218, we're adding locale canonicalization based upon CLDR json aliases.json data. This data is missing a handful of aliases that are defined in the bcp xml data. Once this data is added to json as tracked by #562, we'll be able to update the locale_canonicalizer to use these aliases as well.

This is blocked on both #218 and #562.

dminor avatar May 31 '21 15:05 dminor

@dminor Do you consider this to be a 1.0 blocker? Is it required for spec compliance?

sffc avatar Jan 27 '22 18:01 sffc

@dminor Do you consider this to be a 1.0 blocker? Is it required for spec compliance?

Not fixing this is a bug, but it's a pretty minor bug, the handful of missing aliases are very much edge cases. I think we can comfortably fix this post 1.0. I suggest punting it.

dminor avatar May 26 '22 20:05 dminor

My understanding so far:

  • this repository contains alias data.
  • I have to interact with icu-datagen in some way to include these files into the locid_transform crate.
    • Presumably by install the icu-datagen binary tool, and committing the generated files in the repository.
  • Create an AliasesV3 struct that includes these new sources of alias data.

kartva avatar Mar 31 '24 04:03 kartva

I've obtained the calendar.json file that seems to contain JSON data by running the download-repo-sources tool. Other bcp47 JSON files can presumably be acquired using the same process.

Next steps:

  • write serde-mapping structs in provider/datagen/src/transform/cldr/cldr_serde/bcp47_*.rs (for each relevant bcp47 file).
  • create AliasesV3, parse and store bcp47 alias data in impl From<&cldr_serde::aliases::Resource> for AliasesV3<'_>, then impl DataProvider for AliasesV3

@sffc do you see anything that I might be missing?

kartva avatar Apr 08 '24 04:04 kartva

This sounds right. I'm not sure if you should need a new AliasesV3. But yes the general idea of pulling the JSON files in with download-repo-sources and then getting them into a canonicalizer data structure is correct. Thanks!

sffc avatar Apr 08 '24 06:04 sffc