icu4x
icu4x copied to clipboard
Split icu_locid_transform into two crates
There is too much going on in icu_locid_transform for it to be a core crate. It should only contain the things essential for fallback.
We need more discussion time for this issue. There's not time for 1.3 to have that discussion. It's not a bad enough problem to block 1.3. So this should block 2.0.
Discuss with:
- @robertbastian
- @sffc
- (@Manishearth)
- (@younies)
- @zbraniecki
Decision from 2023-11-09:
- icu_locid stays as is
- icu_locid_transform becomes the lower level crate
- Revive icu_canonicalizer to become the higher level crate
See full notes in the 2023-Q3 Summit notes doc.
The consensus from 2023-11-09 was to use the name icu_canonicalizer, but the revivable crate is named icu_locale_canonicalizer: https://docs.rs/icu_locale_canonicalizer/latest/icu_locale_canonicalizer/
I remember that I did not really like this name from the start because it uses "locale" instead of "locid" as the rest of the crates use. I also don't really like making people write icu::locale_canonicalizer::LocaleDirectionality.
So I would like to make a new proposal for the names of these crates:
- Low-level crate:
icu_fallback - Higher-level crate:
icu_locid_adapters
Note that we already have icu_provider_adapters so this builds on the same type of naming scheme.
The contents of the crates will be as discussed on 2023-11-09. I am not proposing any changes except for the names of the crates. The consensus there was:
- locid_transform crate split: make it icu_locid_transform (for LocaleFallbacker, and future data-driven functionality relating to locales that all components need as a low-level dependency) and icu_canonicalizer (for LocaleCanonicalizer, LocaleExpander, and LocaleDirectionality, and future functionality that is not a universal component dependency)
OK?
Wants approval from:
- [x] @Manishearth
- [ ] @robertbastian
- [ ] @zbraniecki
- [ ] @echeran
icu_provider_adapters contains adapters that are providers and wrap providers. This is absolutely not the case in the high-level locale crate, it contains logic that works with locales, so I think icu_locid_adapters is a very bad name. We might actually need it in the future to provide adapters between icu_locid::Locale and other locale types in the Rust ecosystem.
We explicitly picked one of the names to be icu_locid_transform in order to not have another abandoned crate name. I still stand behind the original decision, even if icu_canonicalizer won't be a revive of the abandoned icu_locale_canonicalizer crate but a new name.
My agreement to the previous consensus was based on the understanding that icu_canonicalizer was the name of the previous crate. Given that it is actually icu_locale_canonicalizer, the consensus is not well-defined and therefore I withdraw my consensus vote for that naming scheme.
New proposal:
icu_fallbackfor the low-level crateicu_locid_transformfor the higher-level crate
Wants approval from:
- [x] @Manishearth
- [ ] @robertbastian
- [ ] @zbraniecki
- [ ] @echeran
I'd rather use a shiny new name for the higher-level crate, but won't die on this hill.
I would love to hear a suggestion for a better name for the higher-level crate. Here are the only ones I've seen:
icu_locid_utilsicu_canonicalizer
icu_locid_infoicu_locid_meta[data]
A whole new can of worms would be suggesting to rename icu_locid to icu_locale in 2.0. The crate's main types are Locale and LanguageIdentifier, there's no LocaleIdentifier that would shorten to locid. Then icu_locale_canonicalizer would be a good fit.
It's named icu_locid because in ICU4C there is locid.h, and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.
I could live with icu_locid_info although I don't know if that's better than icu_locid_transform
I don't think ICU4C file names should have any bearing on our naming.
Note that the library inside the icu-locale crate could still be called icu_locale. At some point crates.io might also figure out that it can resolve icu_locale to icu-locale.
and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.
I don't think this is a big problem. Most users will use the module through the meta crate. For the ones that don't, cargo add icu_locale will work. The library inside the icu-locale crate ~can~ will be called icu_locale anyway.
Another suggestion: Most users of a locale type crate will expect there to be canonicalization, minimization, and other locale information. There's a very small use case (basically icu_provider) that only needs a locale type, with nothing else. So:
icu_locale_core: currenticu_locid.icu_providerdepends on this. Not a module inicuicu_locale: reexports types fromicu_locale_core, and adds everything that's currently inicu_locid_transform. Is a module inicu.
Hmm. What do you think about icu_core? Everything can depend on it and it can include Locale and anything else we need to truly share across all components such as logging and documentation macros (#4467).
So the crates could be:
icu_core= contains Locale, LanguageIdentifier, and internal macros for things such as logging and documentation. We will try to keep it as small as possible.icu_locidORicu_locale= re-exports Locale/LanguageIdentifier and includes LocaleExpander, LocaleCanonicalizer, LocaleDirectionality, etc.icu_fallbackORicu_locid_transform= fallback machinery.
Dependency matrix:
| Crate | Dependencies |
|---|---|
| icu_core | utils |
| icu_provider | utils, icu_core |
| icu_fallback | utils, icu_core, icu_provider |
| icu_locid, no default features | utils, icu_core, icu_provider |
icu_locid, compiled_data |
utils, icu_core, icu_provider, icu_fallback |
| icu_decimal, no default features | utils, icu_core, icu_provider |
icu_decimal, compiled_data |
utils, icu_core, icu_provider, icu_fallback |
Does that look about right?
I guess icu_core is fine, although there might be a non-ICU use case for icu_locale_core that doesn't require ICU's logging macros. All ICU4X crates (except for locid) already depend on icu_provider, so that's kind of our core crate.
- @sffc - We have stuff that we might want to share across icu4x beyond just locales. So maybe we can have
icu_core. - @robertbastian - Locale, LanguageIdentifier are well designed and shouldn't be lumped in with the rest of the utils.
- @Manishearth - What are the other utilities?
- @sffc:
- Logging macros, which currently are in
icu_provider::_internal::logging - Size test-and-doc macro, currently in
icu_datetime - Internal documentation macro
- Other macros to deduplicate docs like the experimental alert
- Logging macros, which currently are in
- @Manishearth - I'm warming up to
icu_utilsand having it not be on the 1.0 track - @robertbastian - 1, 3, and 4 seem like they are coupled to ICU4X
- @sffc - The log dependency could be generalizable. It is mainly is about preserving messages in debug mode with
eprintln! - @robertbastian - It seems the logging could be its own crate, useful outside of icu4x.
- @robertbastian - I'd rather have multiple small crates than one grab bag, it simplifies version and makes vendoring/change management easier
- @robertbastian - I think for this issue we should discuss whether icu_core should be a locid + utils grab bag, or have icu_locid_core and some other solution for utils
- @echeran - I think we should avoid
icu_coreunless it is really core to internationalization (universally used across all crates).icu_locale_coresounds good - @sffc - All the things in my list are macros. So they don't even need to be in a crate. They can be included via the filesystem.
include!? - @Manishearth - Maybe but
include!is bad for publishing. We can copy the files around.
Proposed conclusion: icu_<something with locale>_core with Locale and friends, and icu_<something with locale> with data and funcionality. Do not necessarily need the same infix, see
LGTM: @robertbastian @manishearth @sffc @echeran
Possible names:
icu_locale_core+icu_locid+icu::locidicu_locale_core+icu_locid+icu::localeicu_locale_core+icu-locale+icu::localeicu_locale_core+icu_locale+icu::locale(if possible)icu_locid_core+icu_locid+icu::locid
Voting:
- @manishearth: 4 > (2 ~> 1) > 5 >> 3
- @robertbastian: 4 > 3 > 5 >> 2 = 1
- @sffc: 1 ~> 5 > 4 ~>> 3 >> 2
- @echeran: 4 > 3 > 5 > 1 >> 2
Note: 4 > 5 >> everything else
Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.
Discussion on locid_transform/locid_fallback
- @sffc - I see a few options
- Keep
icu_locid_transform. Re-export the stuff fromicu_loc[ale/id]that it currently re-exports - Keep
icu_locid_transformand remove everything except the limited bit of data-using code that is needed for thecompiled_datafeature (right now, the langid fallbacker) - Keep
icu_locid_transformand remove everything except the language identifier fallbacker (from @robertbastian). This is 4 but it reduces churn, this meanslocid_transformis really "justicu_fallbackwith a terrible name" - Make a new crate
icu_fallbackwith the contract being locale fallback only - Make a new crate
icu_compiled_data_utils(or similar name) that includes fallback, and re-exportLocaleFallbackerfromicu_locid
- Keep
- @robertbastian - I don't want to reexport
icu_locidinicu_locid_transform, eventually we'll have to rip off the bandage so let's just do it. There will be a bit of migration work in 2.0 anyway - @robertbastian - Do we really need to separate these crates?
voting:
- Manish: 4 > 5 > (2 ~> 3) > 1
- @sffc: 5 > 2 > 1 > 3 > 4
- @robertbastian 4 > 3 > 2 > 1 > 5
- @echeran: 3 > 4 > 1 ~>> 5=2
No decision:
General points of contention:
- Some people are prioritizing keeping compiled data from pulling in "extra dependencies" (which can include code in a dep crate that is unused). This includes being worried about new things that may be added to locid_transform/etc.
- Some people don't think is is an issue because it hasn't been since 1.0
- Some people are not happy with the
locid_transformname and would like to see it go away. - A weakly held position is avoiding churn in crate names.
We got icu_locale: https://crates.io/crates/icu_locale
(crates.io rules prevent renaming underscores to dashes, however crates.io rules sometimes allow empty squatted crates to be taken over, and icu-locale counts since it's empty and not used by anyone)
Note: 4 > 5 >> everything else
Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.
As a result, the decision on this is that we do option 4: icu_locale_core + icu_locale + icu::locale (if possible)
- @sffc - The new
icu_localecrate will contain, as discussed, all locale-related, data-driven code. This includes:- Locale canonicalizer
- Locale directionality
- Maybe LocaleInfo (week info, etc... see the ECMAScript proposal); requires some investigvsation about exactly what goes where. Maybe limit to just the defaults for the Unicode extension keywords.
- Locale fallback
- Locale display names
- @robertbastian - So the one restriction I'm seeing is that compiled data should not pull in display names. So we can have display names live in its own crate, or fallback live in its own crate.
options are:
- icu_locale (displaynames, canonicalizer+) with another small crate A. icu_fallback with re-export. This is a "private", internal-facing crate. contains only fallback. B. icu_fallback without re-export. This is a "public" crate and the canonical location of the fallback machinery. C. Like A but could be named icu_compiled_data_utils pending bikeshed. could contain other things that compiled_data needs.
- icu_locale (fallbacker, canonicalizer+) + icu_displaynames (displaynames)
- icu_locale (displaynames, fallbacker, canonicalizer+) + nothing
- icu_locale (fallbacker, canonicalizer+, displaynames reexport) + icu_displaynames (displaynames, internal crate) -- eliminated because worse than 2
(canonicalizer+ = canonicalizer, directionality, "other things that don't need much data")
Manish draws some venn diagrams to help focus the discussion.
Discussion: The actual things people seem to care about are these:
- There should be a crate that is minimal for what is needed by compiled_data
- That crate will be reexported by icu_locale so it shouldn't contain anything we won't do from icu_locale
- There are likely to be things needed by compiled_data that are "not fallback", i.e. "default prefs data".
There was also discussion on instead a model where icu_displaynames is a separate crate, and icu_locale doesn't contain it, keeping it small (this helps maintain 1 since icu_locale stays mostly minimal). Robert and Manish don't like this as much since it's not fully minimal and it would be nice to have displaynames in icu_locale.
This leads us to:
icu_bikeshed (containing things needed by compiled data that also makes sense to reexport from icu_locale), icu_locale (reexport bikeshed, canonicalizer+, reexport locale_core, displaynames)
icu_bikeshed will start with fallbacker, and will likely contain default locale pref stuff if we add that.
If we end up with anything "needed by compiled data that should not be reexported from icu_locale", then we should make a new crate or something. This is a bridge we cross when we need to, we're not yet sure if there's anything that will belong here.
Agreed: @manishearth @robertbastian @sffc @echeran
We bikeshed icu_bikeshed later
Bikeshed suggestions for icu_bikeshed:
icu_compiled_data_utilsicu_locale_mantle("not core, but still somewhat core")icu_locale_core2- ...
icu_fallbackicu_locale_fallbackicu_locale_opsicu_locale_data_utilsicu_locale_support
I like locale_ops, locale_data_utils, locale_support, and locale_mantle
Discusion with Zibi present
3 crates:
icu_locale_core: structures (currentlyicu_locid)icu_bikeshed: fallbacking and other locale-related machinery for compiled dataicu_locale: re-exportsicu_locale_core,icu_bikeshed, and adds data-driven functionality (similar toicu_locid_transform)
Discussion:
- @zbraniecki - I think the vast majority of users of
Localedon't need the data. We shouldn't make them either carry all the data or hunt around foricu_locale_core. My slight preference would be to keepicu_localewith the structs only. - @robertbastian - The argument for this structure is that users of
Localeprobably want access to canonicalization, display names, etc. - @zbraniecki - Long term I think most users won't want or need canonicalization.
- @Manishearth - Having a type and the things to use it with in the same crate seem good from a user point of view. I think we can add a feature flag for some of the more data-using things if people need.
Proposals for icu_bikeshed:
- icu_locale_mantle
- icu_fallback
- icu_locale_fallback
- icu_locale_ops
- icu_locale_data_utils
- icu_locale_support
Discussion:
- @echeran - I think
icu_locale_mantleis too "cute" - @zbraniecki - I agree that
icu_locale_mantlehinders the ability of users to grok the crate. But it is nice once people figure out what it means. - @zbraniecki -
icu_locale_fallbackseems good because it won't expand beyond fallback. The contract is about retrieving a value that we need from the locale but the locale doesn't contain that value. It also covers my expectation that fallback itself may grow in ways that contain weights, etc. - @Manishearth - We don't want people going to this crate for fallback; they should be going to the re-export crate.
- @robertbastian - If we make a new crate for every feature we split out of
icu_locale, it gets very messy.
Open for voting
The first round of voting didn't reach a clear consensus. I'll send out another ballot with these options:
icu_locale_fallback icu_locale_ops icu_locale_support icu_locale_util (added late)
I think we should put display names in a crate separate from the others, because it's unlike the other data: It's for UI and large-ish.
ICU4X WG discussion from 2024-04-18:
- @leftmostcat - What immediately strikes me is that you could have option 5 (four small crates) with feature-driven re-exports
- @sffc - The only hard requirement is that there must be a crate with just the core structs because otherwise we have circular dependencies
- @robertbastian - Features are messy because then you need to worry about forwarding features
- @leftmostcat -
_utildoesn't describe what anything does; it means we don't know what it does and couldn't come up with a better name - @robertbastian - We talked the locale crate splitting briefly with Zibi but didn't have enough time to wrap it up. For the ill-formed strings, Henri seems to be the expert on those types.
Decision on how to move forward: convene a small group to finalize decision on the controversial topics and get alignment. This small group discussion will produce the final recommendation of the WG.
Locale crate splitting:
- @zbraniecki
- @sffc
- @robertbastian
- @Manishearth
- @echeran
- (@younies)
- (@nekevss)
Note: The names of the crates resulting from the above discussion will be brought back to the main group.
2024-04-23 Special Meeting
https://github.com/unicode-org/icu4x/issues/3931
Four categories of functionality:
- Core Structs, including comparison, normalization
- Parent Locales, Partial Likely Subtags
- Aliases, Full Likely Subtags, Locale Directionality
- Display Names
Options
Option 1
One-stop shop:
icu::locale::Locale
icu::locale::LocaleExpander
icu::locale::LocaleCanonicalizer
icu::locale::LocaleDirectionality
icu::locale::LocaleDisplayNames
Requires child crates:
icu_locale_corefor core structs: required because of circular dependency withicu_providericu_locale_fallbackused for compiled data- Optional:
icu_locale_transform,icu_locale_names
Note: this creates multiple ways to include the same types:
icu_locale_fallback::LocaleFallbacker
icu_locale::LocaleFallbacker
icu::locale::LocaleFallbacker
Option 2+
Different crates for different functionality
icu_localefor core structsicu_locale_fallback,icu_locale_transform,icu_locale_names, ...
Included in metacrate as:
icu::locale::Locale
icu::locale_fallback::LocaleFallbacker
icu::locale_transform::LocaleCanonicalizer
...
Notes
- @zbraniecki - I'm shifting from thinking of
icu_localeas lightweight/small, 90% of users want core structs, to this is a metacrate, if you don't know what you're doing, come here and you can tailor it. - @echeran - When I think about artifacts in the Java world... I think the metacrate makes sense here because, among other reasons, it is consistent with the ICU metacrate itself. It gives them the ability to summon any subcomponent of functionality. If we're already doing it at the top level, we should be consistent throughout.
Discussion on icu_locale_core naming:
- @zbraniecki - Instead of
icu_locale_core, should we considericu_locid? I still think there's a disproportionally high value for a crate that just gives them structs.icu_locale_coresounds to me like an internal crate introduced predominently to resolve the circular dependency. But I think it is a standalone crate, popular relative to the whole of ICU. Every HTTP, locale mangement, language handling crate will want to use this. We give the nicer name to the metacrate, andicu_locidis for the structs. - @robertbastian -
icu_locidis not a real name; there is nothing called "locid". Maybeicu_locale_types? - @zbraniecki -
icu_locale_typesseems not much better thanicu_locale_core - @Manishearth - I think we can teach people about
_core. It can mean other things, similar to_traitscrates. I agree thatlocidis just confusing. It seems that if we expect this to be a big popular crate, we shouldn't restrict ourselves to the ICU4C name. - @sffc - I would be supportive of
icu_locid. The transfer of knowledge from ICU4C and the 4 years of precedent in Rust should not be discounted. - @echeran - I prefer
icu_locale_corebecause, (1), it's a common convention when having a multi-module artifact to have a "core", including in Java. (2) Visual consistency oficu_localeandicu_locale_somethingthat reflects parent/meta crate to child crate relationship. (3) "locid" is short, and is the ICU4C identifier, but without background knowledge, I just want to see "locale". - @Manishearth - I disagree with the concept of knowledge transfer. I understand that from the perspective of a "bidi level", which is actual knowledge being transfered. But there's no reason we shouldn't be just using the word "locale". With
locid, we are accessible to ICU4C developers, but not new developers. Withlocale, we're accessible to developers from any background. - @zbraniecki - Thanks for the discussion. I would still weakly prefer
icu_locid. I see_coreas being similar to_raw, an internal-facing thing. Should we considericu_locale_id? - @echeran - It seems to me that
_coreimplies what we're going for here. The signal for what it means is pretty strongly correlated to what we're trying to do here. - @Manishearth - I think
_coregets used for many things in the Rust world. - @sffc - I remember seeing it in the context of
randin Rust. - @zbraniecki - I want to emphasize "light" or "minimal". When I see "core" I think of "raw".
- @Manishearth - We can document that in the first line of the crate docs. It's good to have that answer up front.
Discussion on old crates:
- @robertbastian - Should we keep the old crates with re-exports? Maybe deprecated?
- @sffc - I prefer to have a clean break in 2.0
- @Manishearth - We have a graveyard and a system in place, let's use it
Discussion on icu_locale_fallback crate name:
- @echeran - If it's strictly about fallback, fine, but it seems like we do a lot of refactoring and we may want to include other functionality in this crate. If the crate contains more than fallback, we'd need to refactor again.
- @sffc - "fallback" is the term that, I think, most accurately and precisely describes what's in this crate.
- @echeran - This seems fine so long as we have good documentation that, e.g., likely subtags is part of fallback.
Proposal:
Introduce 3 crates:
icu_locale_core- Contains core structs: very similar to current
icu_locid - Contains clear top-level documentation that this is a non-internal crate that is intended to be used when fewer dependencies are desired
- Contains core structs: very similar to current
icu_locale_fallback- Contains parent locales and basic likely subtags: subset of current
icu_locid_transform
- Contains parent locales and basic likely subtags: subset of current
icu_locale- Contains re-exports
icu_locale_coreandicu_locale_fallback, and also includes other locale-related functionality, which could be split to additional child crates if the need arises
- Contains re-exports
- Names of crates are as stated here.
- The two old crates will be retired as graveyard crates at their version 2.
LGTM: @echeran @robertbastian @sffc @zbraniecki @Manishearth
This can be done ahead of time by creating crates and re-exports.
After starting this I don't think it's worth doing for 1.5, verifying that everything stays semver compatible and introducing extra crates is a lot more work than just the few renames and moves that this needs.