icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Split icu_locid_transform into two crates

Open sffc opened this issue 2 years ago • 35 comments
trafficstars

There is too much going on in icu_locid_transform for it to be a core crate. It should only contain the things essential for fallback.

sffc avatar Aug 24 '23 17:08 sffc

We need more discussion time for this issue. There's not time for 1.3 to have that discussion. It's not a bad enough problem to block 1.3. So this should block 2.0.

sffc avatar Aug 24 '23 18:08 sffc

Discuss with:

  • @robertbastian
  • @sffc
  • (@Manishearth)
  • (@younies)
  • @zbraniecki

sffc avatar Aug 24 '23 18:08 sffc

Decision from 2023-11-09:

  • icu_locid stays as is
  • icu_locid_transform becomes the lower level crate
  • Revive icu_canonicalizer to become the higher level crate

See full notes in the 2023-Q3 Summit notes doc.

sffc avatar Nov 09 '23 23:11 sffc

The consensus from 2023-11-09 was to use the name icu_canonicalizer, but the revivable crate is named icu_locale_canonicalizer: https://docs.rs/icu_locale_canonicalizer/latest/icu_locale_canonicalizer/

I remember that I did not really like this name from the start because it uses "locale" instead of "locid" as the rest of the crates use. I also don't really like making people write icu::locale_canonicalizer::LocaleDirectionality.

So I would like to make a new proposal for the names of these crates:

  1. Low-level crate: icu_fallback
  2. Higher-level crate: icu_locid_adapters

Note that we already have icu_provider_adapters so this builds on the same type of naming scheme.

The contents of the crates will be as discussed on 2023-11-09. I am not proposing any changes except for the names of the crates. The consensus there was:

  • locid_transform crate split: make it icu_locid_transform (for LocaleFallbacker, and future data-driven functionality relating to locales that all components need as a low-level dependency) and icu_canonicalizer (for LocaleCanonicalizer, LocaleExpander, and LocaleDirectionality, and future functionality that is not a universal component dependency)

OK?

Wants approval from:

  • [x] @Manishearth
  • [ ] @robertbastian
  • [ ] @zbraniecki
  • [ ] @echeran

sffc avatar Nov 30 '23 18:11 sffc

icu_provider_adapters contains adapters that are providers and wrap providers. This is absolutely not the case in the high-level locale crate, it contains logic that works with locales, so I think icu_locid_adapters is a very bad name. We might actually need it in the future to provide adapters between icu_locid::Locale and other locale types in the Rust ecosystem.

We explicitly picked one of the names to be icu_locid_transform in order to not have another abandoned crate name. I still stand behind the original decision, even if icu_canonicalizer won't be a revive of the abandoned icu_locale_canonicalizer crate but a new name.

robertbastian avatar Dec 01 '23 10:12 robertbastian

My agreement to the previous consensus was based on the understanding that icu_canonicalizer was the name of the previous crate. Given that it is actually icu_locale_canonicalizer, the consensus is not well-defined and therefore I withdraw my consensus vote for that naming scheme.

New proposal:

  • icu_fallback for the low-level crate
  • icu_locid_transform for the higher-level crate

Wants approval from:

  • [x] @Manishearth
  • [ ] @robertbastian
  • [ ] @zbraniecki
  • [ ] @echeran

sffc avatar Dec 01 '23 22:12 sffc

I'd rather use a shiny new name for the higher-level crate, but won't die on this hill.

robertbastian avatar Dec 07 '23 14:12 robertbastian

I would love to hear a suggestion for a better name for the higher-level crate. Here are the only ones I've seen:

  • icu_locid_utils
  • icu_canonicalizer

sffc avatar Dec 08 '23 01:12 sffc

  • icu_locid_info
  • icu_locid_meta[data]

A whole new can of worms would be suggesting to rename icu_locid to icu_locale in 2.0. The crate's main types are Locale and LanguageIdentifier, there's no LocaleIdentifier that would shorten to locid. Then icu_locale_canonicalizer would be a good fit.

robertbastian avatar Dec 08 '23 07:12 robertbastian

It's named icu_locid because in ICU4C there is locid.h, and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.

I could live with icu_locid_info although I don't know if that's better than icu_locid_transform

sffc avatar Dec 28 '23 10:12 sffc

I don't think ICU4C file names should have any bearing on our naming.

Note that the library inside the icu-locale crate could still be called icu_locale. At some point crates.io might also figure out that it can resolve icu_locale to icu-locale.

robertbastian avatar Jan 04 '24 09:01 robertbastian

and because the name icu-locale was claimed in kebab case and we can't get it back in snake case.

I don't think this is a big problem. Most users will use the module through the meta crate. For the ones that don't, cargo add icu_locale will work. The library inside the icu-locale crate ~can~ will be called icu_locale anyway.

robertbastian avatar Feb 08 '24 15:02 robertbastian

Another suggestion: Most users of a locale type crate will expect there to be canonicalization, minimization, and other locale information. There's a very small use case (basically icu_provider) that only needs a locale type, with nothing else. So:

  • icu_locale_core: current icu_locid. icu_provider depends on this. Not a module in icu
  • icu_locale: reexports types from icu_locale_core, and adds everything that's currently in icu_locid_transform. Is a module in icu.

robertbastian avatar Feb 08 '24 15:02 robertbastian

Hmm. What do you think about icu_core? Everything can depend on it and it can include Locale and anything else we need to truly share across all components such as logging and documentation macros (#4467).

sffc avatar Feb 09 '24 01:02 sffc

So the crates could be:

  • icu_core = contains Locale, LanguageIdentifier, and internal macros for things such as logging and documentation. We will try to keep it as small as possible.
  • icu_locid OR icu_locale = re-exports Locale/LanguageIdentifier and includes LocaleExpander, LocaleCanonicalizer, LocaleDirectionality, etc.
  • icu_fallback OR icu_locid_transform = fallback machinery.

Dependency matrix:

Crate Dependencies
icu_core utils
icu_provider utils, icu_core
icu_fallback utils, icu_core, icu_provider
icu_locid, no default features utils, icu_core, icu_provider
icu_locid, compiled_data utils, icu_core, icu_provider, icu_fallback
icu_decimal, no default features utils, icu_core, icu_provider
icu_decimal, compiled_data utils, icu_core, icu_provider, icu_fallback

Does that look about right?

sffc avatar Feb 09 '24 02:02 sffc

I guess icu_core is fine, although there might be a non-ICU use case for icu_locale_core that doesn't require ICU's logging macros. All ICU4X crates (except for locid) already depend on icu_provider, so that's kind of our core crate.

robertbastian avatar Feb 09 '24 08:02 robertbastian

  • @sffc - We have stuff that we might want to share across icu4x beyond just locales. So maybe we can have icu_core.
  • @robertbastian - Locale, LanguageIdentifier are well designed and shouldn't be lumped in with the rest of the utils.
  • @Manishearth - What are the other utilities?
  • @sffc:
    1. Logging macros, which currently are in icu_provider::_internal::logging
    2. Size test-and-doc macro, currently in icu_datetime
    3. Internal documentation macro
    4. Other macros to deduplicate docs like the experimental alert
  • @Manishearth - I'm warming up to icu_utils and having it not be on the 1.0 track
  • @robertbastian - 1, 3, and 4 seem like they are coupled to ICU4X
  • @sffc - The log dependency could be generalizable. It is mainly is about preserving messages in debug mode with eprintln!
  • @robertbastian - It seems the logging could be its own crate, useful outside of icu4x.
  • @robertbastian - I'd rather have multiple small crates than one grab bag, it simplifies version and makes vendoring/change management easier
  • @robertbastian - I think for this issue we should discuss whether icu_core should be a locid + utils grab bag, or have icu_locid_core and some other solution for utils
  • @echeran - I think we should avoid icu_core unless it is really core to internationalization (universally used across all crates). icu_locale_core sounds good
  • @sffc - All the things in my list are macros. So they don't even need to be in a crate. They can be included via the filesystem. include!?
  • @Manishearth - Maybe but include! is bad for publishing. We can copy the files around.

Proposed conclusion: icu_<something with locale>_core with Locale and friends, and icu_<something with locale> with data and funcionality. Do not necessarily need the same infix, see

LGTM: @robertbastian @manishearth @sffc @echeran

Possible names:

  1. icu_locale_core + icu_locid + icu::locid
  2. icu_locale_core + icu_locid + icu::locale
  3. icu_locale_core + icu-locale + icu::locale
  4. icu_locale_core + icu_locale + icu::locale (if possible)
  5. icu_locid_core + icu_locid + icu::locid

Voting:

  • @manishearth: 4 > (2 ~> 1) > 5 >> 3
  • @robertbastian: 4 > 3 > 5 >> 2 = 1
  • @sffc: 1 ~> 5 > 4 ~>> 3 >> 2
  • @echeran: 4 > 3 > 5 > 1 >> 2

Note: 4 > 5 >> everything else

Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.

robertbastian avatar Mar 07 '24 18:03 robertbastian

Discussion on locid_transform/locid_fallback

  • @sffc - I see a few options
    1. Keep icu_locid_transform. Re-export the stuff from icu_loc[ale/id] that it currently re-exports
    2. Keep icu_locid_transform and remove everything except the limited bit of data-using code that is needed for the compiled_data feature (right now, the langid fallbacker)
    3. Keep icu_locid_transform and remove everything except the language identifier fallbacker (from @robertbastian). This is 4 but it reduces churn, this means locid_transform is really "just icu_fallback with a terrible name"
    4. Make a new crate icu_fallback with the contract being locale fallback only
    5. Make a new crate icu_compiled_data_utils (or similar name) that includes fallback, and re-export LocaleFallbacker from icu_locid
  • @robertbastian - I don't want to reexport icu_locid in icu_locid_transform, eventually we'll have to rip off the bandage so let's just do it. There will be a bit of migration work in 2.0 anyway
  • @robertbastian - Do we really need to separate these crates?

voting:

  • Manish: 4 > 5 > (2 ~> 3) > 1
  • @sffc: 5 > 2 > 1 > 3 > 4
  • @robertbastian 4 > 3 > 2 > 1 > 5
  • @echeran: 3 > 4 > 1 ~>> 5=2

No decision:

General points of contention:

  • Some people are prioritizing keeping compiled data from pulling in "extra dependencies" (which can include code in a dep crate that is unused). This includes being worried about new things that may be added to locid_transform/etc.
  • Some people don't think is is an issue because it hasn't been since 1.0
  • Some people are not happy with the locid_transform name and would like to see it go away.
  • A weakly held position is avoiding churn in crate names.

Manishearth avatar Mar 07 '24 19:03 Manishearth

We got icu_locale: https://crates.io/crates/icu_locale

(crates.io rules prevent renaming underscores to dashes, however crates.io rules sometimes allow empty squatted crates to be taken over, and icu-locale counts since it's empty and not used by anyone)

Note: 4 > 5 >> everything else

Decision: if possible we do icu_locale provided crates.io lets us rename. If not, we add icu icu_locid_core and repurpose icu_locid.

As a result, the decision on this is that we do option 4: icu_locale_core + icu_locale + icu::locale (if possible)

Manishearth avatar Mar 08 '24 07:03 Manishearth

  • @sffc - The new icu_locale crate will contain, as discussed, all locale-related, data-driven code. This includes:
    • Locale canonicalizer
    • Locale directionality
    • Maybe LocaleInfo (week info, etc... see the ECMAScript proposal); requires some investigvsation about exactly what goes where. Maybe limit to just the defaults for the Unicode extension keywords.
    • Locale fallback
    • Locale display names
  • @robertbastian - So the one restriction I'm seeing is that compiled data should not pull in display names. So we can have display names live in its own crate, or fallback live in its own crate.

options are:

  1. icu_locale (displaynames, canonicalizer+) with another small crate A. icu_fallback with re-export. This is a "private", internal-facing crate. contains only fallback. B. icu_fallback without re-export. This is a "public" crate and the canonical location of the fallback machinery. C. Like A but could be named icu_compiled_data_utils pending bikeshed. could contain other things that compiled_data needs.
  2. icu_locale (fallbacker, canonicalizer+) + icu_displaynames (displaynames)
  3. icu_locale (displaynames, fallbacker, canonicalizer+) + nothing
  4. icu_locale (fallbacker, canonicalizer+, displaynames reexport) + icu_displaynames (displaynames, internal crate) -- eliminated because worse than 2

(canonicalizer+ = canonicalizer, directionality, "other things that don't need much data")


Manish draws some venn diagrams to help focus the discussion.


Discussion: The actual things people seem to care about are these:

  1. There should be a crate that is minimal for what is needed by compiled_data
  2. That crate will be reexported by icu_locale so it shouldn't contain anything we won't do from icu_locale
  3. There are likely to be things needed by compiled_data that are "not fallback", i.e. "default prefs data".

There was also discussion on instead a model where icu_displaynames is a separate crate, and icu_locale doesn't contain it, keeping it small (this helps maintain 1 since icu_locale stays mostly minimal). Robert and Manish don't like this as much since it's not fully minimal and it would be nice to have displaynames in icu_locale.

This leads us to:

icu_bikeshed (containing things needed by compiled data that also makes sense to reexport from icu_locale), icu_locale (reexport bikeshed, canonicalizer+, reexport locale_core, displaynames)

icu_bikeshed will start with fallbacker, and will likely contain default locale pref stuff if we add that.

If we end up with anything "needed by compiled data that should not be reexported from icu_locale", then we should make a new crate or something. This is a bridge we cross when we need to, we're not yet sure if there's anything that will belong here.

Agreed: @manishearth @robertbastian @sffc @echeran

We bikeshed icu_bikeshed later

Manishearth avatar Mar 11 '24 14:03 Manishearth

Bikeshed suggestions for icu_bikeshed:

  • icu_compiled_data_utils
  • icu_locale_mantle ("not core, but still somewhat core")
  • icu_locale_core2
  • ...

Manishearth avatar Mar 11 '24 14:03 Manishearth

  • icu_fallback
  • icu_locale_fallback
  • icu_locale_ops
  • icu_locale_data_utils
  • icu_locale_support

sffc avatar Mar 11 '24 18:03 sffc

I like locale_ops, locale_data_utils, locale_support, and locale_mantle

Manishearth avatar Mar 12 '24 10:03 Manishearth

Discusion with Zibi present

3 crates:

  • icu_locale_core: structures (currently icu_locid)
  • icu_bikeshed: fallbacking and other locale-related machinery for compiled data
  • icu_locale: re-exports icu_locale_core, icu_bikeshed, and adds data-driven functionality (similar to icu_locid_transform)

Discussion:

  • @zbraniecki - I think the vast majority of users of Locale don't need the data. We shouldn't make them either carry all the data or hunt around for icu_locale_core. My slight preference would be to keep icu_locale with the structs only.
  • @robertbastian - The argument for this structure is that users of Locale probably want access to canonicalization, display names, etc.
  • @zbraniecki - Long term I think most users won't want or need canonicalization.
  • @Manishearth - Having a type and the things to use it with in the same crate seem good from a user point of view. I think we can add a feature flag for some of the more data-using things if people need.

Proposals for icu_bikeshed:

  1. icu_locale_mantle
  2. icu_fallback
  3. icu_locale_fallback
  4. icu_locale_ops
  5. icu_locale_data_utils
  6. icu_locale_support

Discussion:

  • @echeran - I think icu_locale_mantle is too "cute"
  • @zbraniecki - I agree that icu_locale_mantle hinders the ability of users to grok the crate. But it is nice once people figure out what it means.
  • @zbraniecki - icu_locale_fallback seems good because it won't expand beyond fallback. The contract is about retrieving a value that we need from the locale but the locale doesn't contain that value. It also covers my expectation that fallback itself may grow in ways that contain weights, etc.
  • @Manishearth - We don't want people going to this crate for fallback; they should be going to the re-export crate.
  • @robertbastian - If we make a new crate for every feature we split out of icu_locale, it gets very messy.

Open for voting

Manishearth avatar Mar 14 '24 18:03 Manishearth

The first round of voting didn't reach a clear consensus. I'll send out another ballot with these options:

icu_locale_fallback icu_locale_ops icu_locale_support icu_locale_util (added late)

sffc avatar Apr 11 '24 17:04 sffc

I think we should put display names in a crate separate from the others, because it's unlike the other data: It's for UI and large-ish.

hsivonen avatar Apr 17 '24 09:04 hsivonen

ICU4X WG discussion from 2024-04-18:

  • @leftmostcat - What immediately strikes me is that you could have option 5 (four small crates) with feature-driven re-exports
  • @sffc - The only hard requirement is that there must be a crate with just the core structs because otherwise we have circular dependencies
  • @robertbastian - Features are messy because then you need to worry about forwarding features
  • @leftmostcat - _util doesn't describe what anything does; it means we don't know what it does and couldn't come up with a better name
  • @robertbastian - We talked the locale crate splitting briefly with Zibi but didn't have enough time to wrap it up. For the ill-formed strings, Henri seems to be the expert on those types.

Decision on how to move forward: convene a small group to finalize decision on the controversial topics and get alignment. This small group discussion will produce the final recommendation of the WG.

Locale crate splitting:

  • @zbraniecki
  • @sffc
  • @robertbastian
  • @Manishearth
  • @echeran
  • (@younies)
  • (@nekevss)

Note: The names of the crates resulting from the above discussion will be brought back to the main group.

sffc avatar Apr 23 '24 18:04 sffc

2024-04-23 Special Meeting

https://github.com/unicode-org/icu4x/issues/3931

Four categories of functionality:

  1. Core Structs, including comparison, normalization
  2. Parent Locales, Partial Likely Subtags
  3. Aliases, Full Likely Subtags, Locale Directionality
  4. Display Names

Options

Option 1

One-stop shop:

icu::locale::Locale icu::locale::LocaleExpander icu::locale::LocaleCanonicalizer icu::locale::LocaleDirectionality icu::locale::LocaleDisplayNames

Requires child crates:

  • icu_locale_core for core structs: required because of circular dependency with icu_provider
  • icu_locale_fallback used for compiled data
  • Optional: icu_locale_transform, icu_locale_names

Note: this creates multiple ways to include the same types:

icu_locale_fallback::LocaleFallbacker icu_locale::LocaleFallbacker icu::locale::LocaleFallbacker

Option 2+

Different crates for different functionality

  • icu_locale for core structs
  • icu_locale_fallback, icu_locale_transform, icu_locale_names, ...

Included in metacrate as:

icu::locale::Locale icu::locale_fallback::LocaleFallbacker icu::locale_transform::LocaleCanonicalizer ...

Notes

  • @zbraniecki - I'm shifting from thinking of icu_locale as lightweight/small, 90% of users want core structs, to this is a metacrate, if you don't know what you're doing, come here and you can tailor it.
  • @echeran - When I think about artifacts in the Java world... I think the metacrate makes sense here because, among other reasons, it is consistent with the ICU metacrate itself. It gives them the ability to summon any subcomponent of functionality. If we're already doing it at the top level, we should be consistent throughout.

Discussion on icu_locale_core naming:

  • @zbraniecki - Instead of icu_locale_core, should we consider icu_locid? I still think there's a disproportionally high value for a crate that just gives them structs. icu_locale_core sounds to me like an internal crate introduced predominently to resolve the circular dependency. But I think it is a standalone crate, popular relative to the whole of ICU. Every HTTP, locale mangement, language handling crate will want to use this. We give the nicer name to the metacrate, and icu_locid is for the structs.
  • @robertbastian - icu_locid is not a real name; there is nothing called "locid". Maybe icu_locale_types?
  • @zbraniecki - icu_locale_types seems not much better than icu_locale_core
  • @Manishearth - I think we can teach people about _core. It can mean other things, similar to _traits crates. I agree that locid is just confusing. It seems that if we expect this to be a big popular crate, we shouldn't restrict ourselves to the ICU4C name.
  • @sffc - I would be supportive of icu_locid. The transfer of knowledge from ICU4C and the 4 years of precedent in Rust should not be discounted.
  • @echeran - I prefer icu_locale_core because, (1), it's a common convention when having a multi-module artifact to have a "core", including in Java. (2) Visual consistency of icu_locale and icu_locale_something that reflects parent/meta crate to child crate relationship. (3) "locid" is short, and is the ICU4C identifier, but without background knowledge, I just want to see "locale".
  • @Manishearth - I disagree with the concept of knowledge transfer. I understand that from the perspective of a "bidi level", which is actual knowledge being transfered. But there's no reason we shouldn't be just using the word "locale". With locid, we are accessible to ICU4C developers, but not new developers. With locale, we're accessible to developers from any background.
  • @zbraniecki - Thanks for the discussion. I would still weakly prefer icu_locid. I see _core as being similar to _raw, an internal-facing thing. Should we consider icu_locale_id?
  • @echeran - It seems to me that _core implies what we're going for here. The signal for what it means is pretty strongly correlated to what we're trying to do here.
  • @Manishearth - I think _core gets used for many things in the Rust world.
  • @sffc - I remember seeing it in the context of rand in Rust.
  • @zbraniecki - I want to emphasize "light" or "minimal". When I see "core" I think of "raw".
  • @Manishearth - We can document that in the first line of the crate docs. It's good to have that answer up front.

Discussion on old crates:

  • @robertbastian - Should we keep the old crates with re-exports? Maybe deprecated?
  • @sffc - I prefer to have a clean break in 2.0
  • @Manishearth - We have a graveyard and a system in place, let's use it

Discussion on icu_locale_fallback crate name:

  • @echeran - If it's strictly about fallback, fine, but it seems like we do a lot of refactoring and we may want to include other functionality in this crate. If the crate contains more than fallback, we'd need to refactor again.
  • @sffc - "fallback" is the term that, I think, most accurately and precisely describes what's in this crate.
  • @echeran - This seems fine so long as we have good documentation that, e.g., likely subtags is part of fallback.

Proposal:

Introduce 3 crates:

  • icu_locale_core
    • Contains core structs: very similar to current icu_locid
    • Contains clear top-level documentation that this is a non-internal crate that is intended to be used when fewer dependencies are desired
  • icu_locale_fallback
    • Contains parent locales and basic likely subtags: subset of current icu_locid_transform
  • icu_locale
    • Contains re-exports icu_locale_core and icu_locale_fallback, and also includes other locale-related functionality, which could be split to additional child crates if the need arises
  • Names of crates are as stated here.
  • The two old crates will be retired as graveyard crates at their version 2.

LGTM: @echeran @robertbastian @sffc @zbraniecki @Manishearth

sffc avatar Apr 23 '24 18:04 sffc

This can be done ahead of time by creating crates and re-exports.

sffc avatar May 09 '24 18:05 sffc

After starting this I don't think it's worth doing for 1.5, verifying that everything stays semver compatible and introducing extra crates is a lot more work than just the few renames and moves that this needs.

robertbastian avatar May 16 '24 12:05 robertbastian