icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Per-component features for datagen

Open Manishearth opened this issue 2 years ago • 9 comments
trafficstars

Currently datagen pulls in all components. We should do the same thing we do for the metacrate and add per-component features, with a default_components and experimental_components feature so that people using --no-default-features are not burdened unduly.

This is going to be useful for people using the registry API as well: As we grow the registry functionality, it'll become more important to not pull in additional deps.

Personally I really want this for experimental features (I want to be able to avoid having deps on experimental stuff where possible, since experimental stuff still has breaking changes, but if we're doing this anyway might as well do it for everything)

Related: https://github.com/unicode-org/icu4x/issues/2602

Manishearth avatar Mar 03 '23 02:03 Manishearth

This is going to be useful for people using the registry API as well: As we grow the registry functionality, it'll become more important to not pull in additional deps.

I really don't want to get into the practice of relying on component-level features to determine the set of keys used for anything, including the registry. Components are simply too coarse for key slicing to be effective.

sffc avatar Mar 03 '23 16:03 sffc

I really don't want to get into the practice of relying on component-level features to determine the set of keys used for anything, including the registry. Components are simply too coarse for key slicing to be effective.

No, I'm not suggesting that at all: I'm suggesting that people may wish to use the registry without pulling in every last dependency.

That said if we're making the registry more of a thing we probably want it to live in a separate crate anyway.

Manishearth avatar Mar 03 '23 17:03 Manishearth

So to make it clear, my personal motivation is that vendoring of pre-1.0 experimental crates is annoying and I'd rather not have to deal with that. They're explicitly unstable and tend to force updates to be more lockstep (or need more steps), and I would like a future where I do not have to deal with those unless we explicitly have clients who care.

This is a rather weak motivation, especially since it's datagen we're talking about here. I'd still like to see something happen here if possible.

I do think additional motivation is that people using a subset of ICU4X should be able to use the registry API without pulling in all of ICU4X, but that's also weak motivation given that the registry API is currently inside the mother of all dependencies (datagen). We should move it elsewhere and give it features but that's not what this issue is about.

I do recognize that datagen features lead to more potential footguns for users, I still maintain that we should be able to do a lot provided we keep the default reasonable. But I can accept there's disagreement on that.

cc @robertbastian since he has opinions on datagen having features, and previously removed experimental feature tagging

Manishearth avatar Mar 03 '23 17:03 Manishearth

  • @robertbastian - If you disable experimental keys, then datagen should complain. It shouldn't just silently ignore it. And there could be a space of keys that datagen doesn't know about, which we can't distinguish from junk in the key file.

Resolutions:

  1. Add key strings to datagen for all keys, even keys whose components are disabled. Print warnings for unknown strings, and errors for strings whose components are disabled.
  2. Aspire to have component-specific datagen features. OK to start with experimental-only.

sffc avatar Mar 23 '23 18:03 sffc

Outside experimental it gets a lot messier, there are many interactions between e.g. calendar/datetime/timezone, decimal/compactdecimal, a lot of things/properties. This will require extensive restructuring of the datagen crate, which I don't think is worth the effort at this point. Let's backlog this.

robertbastian avatar Jul 11 '23 12:07 robertbastian

Yeah that seems fair. Experimental's the annoying one anyway since upgrading those can take effort

Manishearth avatar Jul 11 '23 13:07 Manishearth

@Manishearth Now that we have a single icu_experimental crate, do you think datagen still needs component-specific features, or would a single experimental feature suffice?

As @robertbastian noted, it's not very easy to add component features to datagen due to interaction between components at datagen time.

sffc avatar Apr 14 '24 03:04 sffc

I think that interaction between datagen components is about as complicated as the interaction between their base crates, yes?

It's still nice to not have to deal with the cyclic import dependency of needing to pull in new crates but being unable to pull in new crates because you need to update all the deps first. The decisions we made about icu_provider and icu_locid compatability help, here.

I'm weakly against removing the features. It seems like people have stronger reasons to remove it.

Manishearth avatar Apr 16 '24 15:04 Manishearth

I'm weakly against removing the features. It seems like people have stronger reasons to remove it.

We don't currently have these. We have a single experimental feature.

robertbastian avatar Apr 16 '24 16:04 robertbastian