boa
boa copied to clipboard
Experiment with ICU4X
https://github.com/unicode-org/icu4x will be useful for implementing i18n-sensititve operations and future proposals like Temporal
Ok, details I found while investigating ICU4X:
- It requires a
DataProviderin order to do actions, which is not trivial to obtain. - A
DataProvidercan be obtained using theicu4x-datagencrate, but it is not published on crates.io, so we would need to import the repo as a submodule if we want to automatize it. - We can use a
StaticDataProviderto embed aDataProvideron the binary withinclude_bytes!. - We can also obtain the data from http://unicode.org/Public/cldr/ but we would need to code a parser into a
BlobSchemafor it to be easily embeddable as aStaticDataProvider. - We can use a
build.rsscript to avoid having to do these things by hand. - The collator we require is a WIP: https://github.com/unicode-org/icu4x/issues/971
CC @sffc
Hi there! I just saw this today.
Here are instructions on how to generate the ICU4X data file:
https://crates.io/crates/icu_datagen
Specific replies inline:
- A
DataProvidercan be obtained using theicu4x-datagencrate, but it is not published on crates.io, so we would need to import the repo as a submodule if we want to automatize it.
It is on crates.io; see link above.
- We can use a
StaticDataProviderto embed aDataProvideron the binary withinclude_bytes!.
Correct. This is the easiest way to include data.
- We can also obtain the data from http://unicode.org/Public/cldr/ but we would need to code a parser into a
BlobSchemafor it to be easily embeddable as aStaticDataProvider.
You should use icu4x-datagen to generate the data. You need the CLDR data available at build time.
- We can use a
build.rsscript to avoid having to do these things by hand.
We have an issue to track this: https://github.com/unicode-org/icu4x/issues/1188
- The collator we require is a WIP: Create a Collator component unicode-org/icu4x#971
@hsivonen has been working on the collator and can share more about the timeline for this feature.
There's now an ICU4X PR that shows the status of the collator.
There's now an ICU4X PR that shows the status of the collator.
Nice! I also saw that you're about to merge a PR with a datagen API for build.rs scripts (https://github.com/unicode-org/icu4x/pull/1819). I'll try to experiment with your branches in the meantime, and hopefully we'll be able to integrate ICU4X in our codebase on your next release!