icu4x
icu4x copied to clipboard
Split `icu_datagen` into a crate for the driver and a crate for the provider
These two components are fairly independent, and there is now a use case for the datagen driver that doesn't use the CLDR/ICU backed provider (icu_datagen_dart). It would be nice to be able to build it without pulling in all the logic and dependencies for parsing CLDR.
Idea:
icu_datagen-DatagenDriverand optionsicu_provider_source-DatagenProvider- type could be renamed to
SourceDataProvider - this will have most dependencies (
zip,wasm,ureqetc.)
- type could be renamed to
icu4x-datagen- The CLI, which depends on both crates- I think the current state where
cargo install icu_datageninstalls a binary calledicu4x-datagenis confusing - This is currently pretty much a separate crate anyway, it's a binary in a library crate, and has its own dependencies (
binfeature)
- I think the current state where
icu_datagen_dartdepends onicu_datagenandicu_provider_blob, making it a lot more lightweight- In complicated build systems, it will be preferrable to use the
icu_datagen_dartapproach, i.e. have one universal blob generated from sources (long build time and long gen time, but shared), and then filter that down for use cases (short build time and short gen time).
Actually, data "generation" happens in the provider crate, so maybe the driver crate should be icu_export (with ExportDriver), and the provider crate icu_provider_datagen (icu_datagen if we don't want to retire the crate name, however all our crates that define providers start with icu_provider_)
The most modular setup would probably be
icu_datagen_transformdefines DatagenProvidericu_datagen_driverdefines DatagenDrivericu_datagendefines theicu4x-datagenbinary
That's my proposal, just with different names.
We could also move icu_provider::datagen into the driver crate.
Latest proposal:
- A crate that contains
DatagenDriver,icu_provider::datagen::{ExportMarker, ExportableDataProvider, ...}, the registry (make_exportable_provider!),BakedExporter- Desired name:
icu_export - Also would like to rename
DatagenDriver->ExportDriver
- Desired name:
- A crate that contains
DatagenProvider- Desired name:
icu_provider_datagen - Alternative names:
icu_datagen,icu_provider_source
- Desired name:
- A crate that contains
icu4x-datagen- Desired name:
icu4x-datagen
- Desired name:
- A crate that contains a CLI that uses a blob instead of
DatagenProvider- Possible name:
icu4x-reexport,icu4x-data-transform - We might be able to model this with Cargo features in
icu4x-datageninstead
- Possible name:
ICU4X-WG discussion:
- @sffc - Main advantage, besides modularity, is reducing dependencies. The registry needs to depend on all of icu4x though. Can we put the registry in the metacrate?
- @robertbastian - Maybe, but the registry helps implement
ExportMarkerand things that are datagen-specific. - @zbraniecki - If you find a crate
icu_export, it's not clear that it a prover crate and not a component crate. It looks like a component called "export". - @robertbastian - Good point, we should name it
icu_provider_export. - @sffc - Do we care at all about the binary name and crate name matching?
- everyone except for @roberbastian - no opinion
- @sffc - You can pass callbacks to other macros. This should allow us to have the registry in the metacrate. Playground
Macro structure brainstorm:
macro_rules! make_exportable {
([$($marker:path,)*], [$($experimental_marker,)*]) => {
#[cfg(feature = "experimental_components")]
icu_provider::make_exportable_provider!([$($marker,)* $($experimental_marker,)*]);
#[cfg(not(feature = "experimental_components"))]
icu_provider::make_exportable_provider!([$($marker,)*]);
}
}
// uses call-site Cargo features
registry!(make_exportable);
macro_rules cb {
($($marker:path),*)) => {
fn all_keys() -> ... {
HashSet::from_iter([
$($marker),* $<marker>::KEY.path()
])
}
}
}
icucrate- registry
all_stable_keys()#[cfg(feature = "experimental")] all_experimental_keys()#[cfg(feature = "experimental")] all_keys()(maybe)key(str)
icu_providericu_provider::datagen::*
icu_datagenExportDriver(needs rayon, fallback)baked_exportermodule (feature-gated)icu_provider_blob::export(feature-gated)icu_provider_fs::export(feature-gated)
icu_provider_sourceSourceDataProvider- depends on
icu(registry) andicu_provider(for_::datagen::*) to implementicu_provider::datagen::ExportableProvider - depends on cpt builder (wasmer), zip, ureq, etc.
- lots of cargo features
icu4x-datagen- Pure binary crate (never appears in a Cargo.toml file)
- depends on
icu_datagen,icu_provider_source,icu(all_stable_keys, all_experimental_keys, key) - duplicates all of
icu_provider_source's cargo features- Cargo.toml:
use_wasm = ["icu_provider_source?/use_wasm"] - maybe refuse to install if feature combination doesn't make sense
- Cargo.toml:
- without
icu_provider_sourcefeature, allows blob inputs (currenticu_datagen_dart)
Shane's version:
icumetacrate as aboveicu_provider_transform- As above, no dependency on driver
icu_datagen- Driver
- Optional dep on
icu_provider_transform icu4x-datagenbinary
Conclusion:
- The crate called
icu_datagenwill have stuff pulled out from it:- In 1.5:
DatagenProvidergoes behind a feature. The feature impacts binary behavior. We could fail to install the binary if the feature combinations are incompatible via a feature-gated compile error. - In 1.5: Implement the
registry!macro as designed above; keep inicu_datagen - In 2.0:
DatagenProvidergets pulled out into its own crate (names to be bikeshed) andicu_datagendoes not depend on it - In 2.0: Move the
registry!macro toicumetacrate - In 2.0:
icu4x-datagengets pulled out into a binary-only crate calledicu4x-datagen
- In 1.5:
LGTM: @robertbastian @sffc
Discussion:
- Bikeshed
- All provider infrastructure should live in crates prefixed with
icu_provider, to distinguish them from components
- All provider infrastructure should live in crates prefixed with
icu_provider_export = { version = "~1.5.0", path = "provider/export" }
# DatagenDriver -> ExportDriver
icu_provider = { version = "~1.5.0", path = "provider/core" }
icu_provider_macros = { version = "~1.5.0", path = "provider/core/macros" }
icu_provider_adapters = { version = "~1.5.0", path = "provider/adapters" }
icu_provider_baked = { version = "~1.5.0", path = "provider/baked" }
icu_provider_blob = { version = "~1.5.0", path = "provider/blob" }
icu_provider_fs = { version = "~1.5.0", path = "provider/fs" }
icu_provider_source = { version = "~1.5.0", path = "provider/source" }
# DatagenProvider -> SourceDataProvider
icu_provider_registry = { version = "~1.5.0", path = "provider/registry" }
- registry location
- keep it in a separate crate, for datagen we don't care about crate count, but give it the
icu_providerprefix - ideally remove from
icu_provider_baked/exportif easy with new paths
- keep it in a separate crate, for datagen we don't care about crate count, but give it the
LGTM: @sffc @robertbastian