icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Are `icu_pattern` and `icu_preferences` components or utils?

Open robertbastian opened this issue 1 year ago • 21 comments

These two crates currently live in utils and are not reexported in the meta crate, yet they use the icu_ prefix.

icu_pattern does not seem ICU-specific, it's just an encoding model for a pattern string. Maybe a name like zeropattern, which is in line with our other data representation crates zerovec and zerotrie, and calls out the zero-copy nature of the implementation, would be more appropriate.

Regarding icu_preferences, I think it should actually be a component, or live in the icu_locale component. If icu_locale is a component, this seems appropriate as a component as well.

robertbastian avatar Jul 04 '24 10:07 robertbastian

Discuss with:

  • @sffc
  • @zbraniecki
  • @robertbastian

robertbastian avatar Jul 04 '24 10:07 robertbastian

I like zeropattern and icu::preferences or maybe icu::prefs

sffc avatar Jul 08 '24 08:07 sffc

  • @zbraniecki - I had originally envisioned preferences as a util, but now it contains semantic data for Unicode extension keywords, so it seems fine as a component. For pattern, I'm less sure; it is a storage and functionality crate. What it uses underneath is less important than it operating on patterns with placeholders. For example, what does {0} mean? That comes from LDML. I think it is icu_pattern.
    • For "preferences", do you want it in its own component, or do you want it in the locale namespace? It could just go into icu_locale_core. For icu_pattern, what exactly is ICU-specific about it? Just curly braces that interpolate things? That seems really common.
  • @zbraniecki - Yes, but there's no RFC XYZ that defines that. Many people replicate the same concept, but the one we use is the one encoded in LDML.
    • I'm not sure. The parser reads LDML, but the runtime representation has nothing to do with LDML.
  • @zbraniecki - Since the parser reads LDML, it encodes LDML assumptions.
  • @sffc - It's fine to have a crate zeropattern that refers to LDML for its string representation, but I don't think the crate is sufficiently tightly coupled to CLDR that it needs to go into the crate name.
  • @zbraniecki - I think the crate was invented for icu4x and will follow that evolution. Looking at search results, I think it's useful to know that it is associated with ICU/CLDR.
    • I would be fine naming it cldr_pattern or ldml_pattern. Just not icu_* because that is for components.
    • What is the scope of the pattern?
    • The next feature I plan to add is built-in support for plural patterns.
    • That would push me more into the cldr or ldml names.

Options for icu_pattern:

  1. zeropattern
  2. cldr_pattern
  3. icu_pattern
  4. ldml_pattern

More discussion:

    • How about icu_preferences?
  • @zbraniecki - I think I like it being its own thing.
    • I had suggested icu::prefs on the issue. icu::prefs::HourCycle::H23, icu::prefs::CollatorStyle::...
    • We're not in the business of saving keystrokes. In general ICU4X uses full words.
  • @younies - "preferences" is confusing because we also have unit preferences coming.
  • @zbraniecki - The main value proposition is it does merging and fallback of preferences. But it can have non-Unicode preferences like date style.
  • @sffc - Should we just call it icu_locale_preferences. icu_locale_types?
  • @robertbastian icu::locale::preferences?
  • @sffc - It would re-export as icu::locale::HourCycle, I think, like icu_locale_core hoists type names
  • Decision: move to ballot, and continue to bikeshed options

Some bikeshedding from Robert:

  • icu
    • locale
      • Locale
      • LanguageIdentifier
      • LocaleFallbacker
      • LocaleCanonicalizer
      • subtags
        • Language
        • ...
      • extensions
        • unicode
          • ...
        • ...
      • fallback
        • LocaleFallbackConfig
        • ... (new) - preferences
        • HourCycle
        • ...

robertbastian avatar Jul 11 '24 17:07 robertbastian

Additional notes from 2024-07-18:

Options for icu_preferences crate structure?

  • icu_preferences re-exported as icu::preferences
  • icu_locale_core::preferences re-exported as icu::locale::preferences
  • icu_locale_preferences re-exported as icu::locale::preferences

Options for preferences naming?

  • preferences
  • prefs

sffc avatar Jul 19 '24 21:07 sffc

Additional thought from https://github.com/unicode-org/icu4x/pull/5260#discussion_r1684978963:

There are sort-of three classes of crates we keep hitting up against:

  1. Things that are truly components (internationalization value proposition)
  2. Things that are truly utilities (things useful outside of an internationalization context)
  3. Things that are in the gray area (things that are useful both in and out of an internationalization context?)

Items in the gray area include:

  • Patterns
  • Measurement Units
  • Date, Time, and Duration types, which currently live in icu_datetime
  • Time zone types, which currently live in icu_timezone

Should we consider introducing a new crate prefix for these cases? cldr_ and ldml_ were proposed above, but we could use anything else like icu_util_.

sffc avatar Jul 19 '24 21:07 sffc

Should we consider introducing a new crate prefix for these cases? cldr_ and ldml_ were proposed above, but we could use anything else like icu_util_.

I think this should be case-by-case naming. Ideally we'd use the standard Rust date, time, duration, and tz types. The problem is that there is no standard, and no 3P crate fulfils our requirements (i.e. chrono, time, jiff, etc.). If we want to split out the types into a separate crate, it will be competing with these crates, and I don't see cldr_ or ldml_ prefixes do that.

robertbastian avatar Jul 24 '24 10:07 robertbastian

I don't think we need to go as far as to have an icu_util prefix, it seems very inside baseball, but i'm not opposed to it. I like unique names like zeropattern, but I don't really mind cldr_ ldml_ etc. (Slight preference for CLDR over LDML: LDML is more inside jargony than CLDR)

Manishearth avatar Jul 26 '24 03:07 Manishearth

I still don't think that our Pattern type is sufficiently coupled with CLDR/LDML that it should go into the crate name, unless we establish a common prefix like cldr_ or ldml_ for these utils crates including the calendar and time zone crates, which are also roughly based on CLDR-specific things but aren't components in and of themselves.

I'm fine with re-exporting icu_pattern. It is roughly equivalent in stature to icu_collections, which we re-export.

sffc avatar Aug 07 '24 18:08 sffc

On preferences: I added types and typed to the ballot since the things we're exporting aren't actually preferences, they are just the typed enums/structs that preferences include.

sffc avatar Aug 07 '24 18:08 sffc

Ballot is open: https://docs.google.com/document/d/1Ol0aKsT252WKRavjb38iYZoh8rrFTw6mvKgf-5HB-Gs/edit

sffc avatar Aug 07 '24 18:08 sffc

I lean toward putting the preferences stuff into icu_locale_core simply because it's fairly closely related and I want us to stop adding more crates without any restraint, especially lower-level crates.

sffc avatar Aug 07 '24 18:08 sffc

For icu_pattern

I still don't think that our Pattern type is sufficiently coupled with CLDR/LDML that it should go into the crate name, unless we establish a common prefix like cldr_ or ldml_ for these utils

I agree. I'm a little forgiving on this point about ldml than cldr just because LDML is a lot more specific to a specification whereas CLDR represents specs + data + tooling. I can be convinced to more forcefully reject inclusion of ldml in the name.

I don't like zeropattern. When we've had zeroX utilities, it's usually because there is an existing X, and what zeroX brings to the table is zero allocation deserialization. When it comes to ICU4X crates for the various components of i18n functionality, we name them after the functionality because that is the salient point -- that's the main attraction, even if they take advantage of zeroX utils to minimize allocation.

I don't like icu_util_pattern because the util is redundant and unnecessary. Plus, sometimes, it's not easy to know whether something is a util or a component or in between. So why bother including that in the name (unless there is a "non-util" version or something)?

For icu_preferences

I don't like any of the names with types or typed in them because I've never seen a package in other statically typed languages exist solely to store types. The package names reflect functionality -- is it math? matrix? security-/ cryptography-related? network? graphics? -- but I don't recall often seeing packages named like matrix.types, matrix.traits, matrix.functions, matrix.macros. APIs are presented to users in higher-level groupings of related functionality, and the code's programming language features are secondary at best.

echeran avatar Aug 08 '24 23:08 echeran

I don't like zeropattern. When we've had zeroX utilities, it's usually because there is an existing X, and what zeroX brings to the table is zero allocation deserialization. When it comes to ICU4X crates for the various components of i18n functionality, we name them after the functionality because that is the salient point -- that's the main attraction, even if they take advantage of zeroX utils to minimize allocation.

A counterexample is zerotrie, a zero-copy trie data structure. There is no crate or type in Rust named trie. Note: in that case, icu_trie would not make sense because it isn't based on anything in ICU/CLDR.

The way I see it, we sort-of have two conventions going on, icu_* for components and ICU/CLDR-based functionality (like icu_calendar and icu_collections), and zero* for zero-copy data structures. icu_pattern could reasonably live in either world.

sffc avatar Aug 09 '24 17:08 sffc

Based on the ballot, the winners are:

  • icu_pattern re-exported as icu::pattern
  • icu_locale_core::preferences re-exported as icu::locale::preferences

sffc avatar Aug 14 '24 21:08 sffc

I had a mistake in my vote counting code, and for the question of icu_preferences, there is currently a tie between these two options:

  • icu_locale_core::preferences re-exported as icu::locale::preferences
  • icu_locale_preferences re-exported as icu::locale::preferences

sffc avatar Aug 14 '24 21:08 sffc

https://github.com/unicode-org/icu4x/issues/5179#issuecomment-2289977370

  • @robertbastian The preferences are small enough that they shouldn't impact compile time. New crates ar somewhat expensive.
  • @sffc We historically fragmented our code into large number of crates, but we heard feedback from customers about it. I think we should try to avoid creating many more crates especially when there's not enough separation.
  • @robertbastian Preferences depends on icu_locale_core
  • @zbraniecki On a regular hot path of a customer who wants to use DateTimeFormat or NumberFormat, would they use icu_locale_core or icu_locale? Do you see a scenario when they pull icu_locale_core and icu_datetime? And if it is icu_locale_core, then it can go into icu_locale_core.
  • @sffc icu_locale_core is the only essential crate. Some users might want icu_locale, such as for canonicalization.
  • @zbraniecki I'm aligned then.

Consensus: put the types in icu_locale_core::preferences.

LGTM: @sffc @robertbastian @zbraniecki @Manishearth

sffc avatar Aug 15 '24 18:08 sffc

I just realized a fatal flaw with re-exporting icu_pattern as icu::pattern, at least in the short term. If it gets re-exported, then it means icu_pattern needs to be graduated, but right now it is very much still in flux, like most utility crates.

@robertbastian you were the one most opposed to icu_pattern as a standalone crate. Would you be okay with an intermediate solution where we do not re-export it for now, but plan to re-export it eventually once it is stable?

sffc avatar Aug 16 '24 14:08 sffc

It could be in the meta crate behind the experimental flag

robertbastian avatar Aug 16 '24 15:08 robertbastian

That said, if icu_pattern is stabilising in the timeframe of utilities 1.0, then maybe it's a utility.

robertbastian avatar Aug 16 '24 15:08 robertbastian

I have definitely been thinking of icu_pattern more along with the timeframe of utilities 1.0, but perhaps we should upgrade it to be more along the timeframe of icu_collections or icu_unicodeset_parser.

sffc avatar Aug 20 '24 07:08 sffc

The second choice from the ballot was icu_pattern not re-exported, which is the status quo. zeropattern was polarizing, getting 3 top votes and 2 bottom votes.

sffc avatar Aug 20 '24 07:08 sffc

  • @sffc - icu_pattern is not stable yet so we can't re-export it as icu::pattern. It should eventually become stable, but maybe more on the timeline of other util crates.
  • @zbraniecki - I don't think we should restrict icu_ to component crates.
  • @sffc - This is the only example left where icu_ is not a component crate. I guess I'm okay considering icu_pattern to be on a more aggressive stabilization timeline.
  • @Manishearth - Do we consider it on 2.0 stabilization track?
  • @sffc - No

Conclusion: keep icu_pattern, re-export as icu::pattern but behind an experimental feature. (don't add to icu_experimental)

LGTM: @sffc @robertbastian @zbraniecki @Manishearth

sffc avatar Sep 05 '24 16:09 sffc

@robertbastian can you do the crate renaming and reexporting agreed in this issue? I think what needs to be done is:

  • Move utils/preferences to be a sub-module of icu_locale_core according to https://github.com/unicode-org/icu4x/issues/5179#issuecomment-2291865571
  • Re-export icu_pattern as icu::pattern but experimental

sffc avatar Sep 28 '24 01:09 sffc