icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Transliterator is missing compiled data constructors

Open kornelski opened this issue 1 year ago • 4 comments
trafficstars

The icu_transliterate crate has constructor methods that take:

P: DataProvider<TransliteratorRulesV1Marker> + DataProvider<CanonicalDecompositionDataV1Marker> + DataProvider<CompatibilityDecompositionSupplementV1Marker> + DataProvider<CanonicalDecompositionTablesV1Marker> + DataProvider<CompatibilityDecompositionTablesV1Marker> + DataProvider<CanonicalCompositionsV1Marker> + ?Sized,

This is a large complex-looking trait bound, which does not help me figure out what the actual type is needed here. Even when I try to browse the mentioned types like TransliteratorRulesV1Marker, it leads to more levels of abstraction, which all end at some HelloWorldProvider.

As a result, this API is so opaque and overly abstract, that I can't even tell if the transliterator is working at all, or is it a defunct stub that has no real implementations.

Some methods have shockingly large trait bounds. To me this is very discouraging, because I'm afraid that if I run into even slightest problems with this crate, I'll be faced with such wall of abstractions upon abstractions, requiring me to understand an abstraction with over 60 degrees of freedom, which seems absolutely an overkill when there's probably between 0 to 2 actual implementations of it.

kornelski avatar May 13 '24 11:05 kornelski

Hi @kornelski!

Most functions with the large trait bounds have a link to this page which explains what they mean:

image

These bounds are what enable ICU4X to slice the data to exactly what is needed and avoid having a big monolithic data file as is required by ICU4C that doesn't have these bounds.

In general you should be using functions like RuleCollection::as_provider instead of RuleCollection::as_provider_unstable. Once you get things working with RuleCollection::as_provider, if you want to start building your own data, you can switch over to RuleCollection::as_provider_unstable, and the argument should be a reference to your custom data provider created as explained in this tutorial.

sffc avatar May 13 '24 18:05 sffc

Actually I'll re-open this issue specifically for Transliterator because it looks like it doesn't currently have a compiled_data constructor.

sffc avatar May 13 '24 18:05 sffc

Partial fix for the Transliterator constructor in https://github.com/unicode-org/icu4x/pull/4898

In the short term, you can get the type you need, the one that implements all the trait bounds, like this:

include!("tests/transliterate/data/baked/mod.rs");

where tests is the icu_experimental crate root which is components/experimental/tests in this repository. That file exposes a type named BakedDataProvider which is what you need.

It's not ideal that this is in a test directory, which is why I put up #4898. I think this is just an oversight from when we moved the code into the new icu_experimental crate.

sffc avatar May 13 '24 19:05 sffc

tests/transliterate/data/baked contains some hand written tests, it does not contain CLDR data.

robertbastian avatar May 14 '24 09:05 robertbastian