icu4x
icu4x copied to clipboard
Revisit policy around component crate interdependent versions
ICU4X 1.3 started using ~ versions between the metacrate, all component crates, and the data crates, as discussed in https://github.com/unicode-org/icu4x/issues/3537. This solves several problems:
- Helps prevent CLDR version skew: you generally don't want some components on a different CLDR version than other components.
- Keeps databake building. We don't guarantee provider struct stability across minor versions so the
~deps make sure that the databake is built for the corresponding library version. - Allows more flexibility when it comes to internal-facing APIs that cross component crate boundaries. For example, we can change the signature of an internal API in
icu_locidand update call sites in component crates without having to worry about breaking older versions of those crates.
@Manishearth said he talked with the Cargo team which may have swayed his opinion in a different direction. Can you post your updated position below?
Personally, I'd like to see some user feedback and evidence before making a change away from what we agreed. From an ICU4X maintainer and integrator point of view, I'm happy with where we landed. Version skew was a real problem in ICU4X 1.0 through 1.2 and this largely solves it. Self-restraint (more restrictions on what we allow ourselves to do as ICU4X developers) can help solve problems 2 and 3 but it doesn't solve problem 1. If we can demonstrate that the version pinning causes harm to our users and if we come up with some other solution to the CLDR version skew question, then I can be swayed.
CC @robertbastian
Also see: https://github.com/unicode-org/icu4x/issues/2715
The basic issue was that the Cargo resolver does not do well with ~ dependencies, and that is extremely unlikely to change. What ends up happening is that Cargo can't always see that a stale ~1.3 transitive dep and a 1.4 dep can be unified by upgrading the actual crates being depended on. In other words, Cargo will accept valid lockfiles for mixes of deps like this, but will not be good at producing them, instead electing to say it can't resolve dependencies.
Now, a thing that may save us here is that these ~ deps are purely internal. I cannot currently trigger any problem cases, most of the problem cases the Cargo team was talking about was when a client of a crate uses a ~ dependency. I would like us to be careful to not suggest this to external clients, it's trivially easy to cause unresolvable problems when deptrees mix.
As a client from the google3 side, having to do upgrades lockstep is still a pain, especially with the common deps like icu_provider (If icu or icu_datagen depends on new icu4x components, they need to be imported, but they can't be imported until icu_provider is upgraded, but that can't be upgraded without upgrading icu and icu_datagen)
My general ideal world is the one where we have better self restraint to prevent problems with 2 and 3. I agree that 1 is harder to fix, but I actually think it's fine to keep using ~ deps as long as we don't have real complaints (but we should keep a look out for them).
Basically, I care less about the actual way we specify deps (until there are concrete issues we have experienced). But I would like us to try and be much more careful about internal breakages. Internal breakages are easier to police than external ones since we have a finite set of "client" code to think about. And I'm not sure we need a hard policy on this, but I think it would be nice to have it be a consideration.
Also, I'm primarily thinking about icu_provider and other common deps here. Potentially having a set of crates where we informally, internally, maintain a stability policy that can be mostly relied on when you're forced to mix versions temporarily.
I think it's a good idea to adopt a policy where by default we don't break internal cross-crate APIs but we plan to make exceptions on a case-by-case basis, similar to how we default to MSRV N-6 but allow N-4 with a strong enough motivation.
- @sffc - Where do we draw the line between what we consider a core component?
- @Manishearth - Probably best effort.
- @Manishearth - note fthat for
icu_propertiesif a crate depends on it for property info then there can be correctness issues with mixing icu_properties 1.3 with icu_whatever 1.4 - @sffc - At the current time I'm slightly concerned about adding
icu_propertiesandicu_collectionsbecause those seem like crates are more likely to have internal APIs and I don't necesarilly like proliferating this requirement since it is hard to consistently enforce and puts restrictions on what we can do as ICU4X developers. - @manishearth - the requirement can just be a soft guarantee about internal breakages, this way we don't have to worry about dataful crates
- @manishearth - for non dataful crates (icu_provider, icu_locid, potential future icu_utils, that's it!), we could use a non-tilde dep, version skew is unimportant for these.
Consensus:
- Initially, add the following crates to a list that is a soft guarantee that will never break other ICU4X crates when individually upgraded. This is enforced by policy.
icu_providericu_locid_transform- Anything that those two crates depend on, such as
icu_locid
- This list may change in the future, but needs discussion/consensus within the group. The list is the property of a release.
LGTM: @Manishearth, @sffc
Tentative consensus:
- Move
icu_providerandicu_locidto non-tilde deps (like utils crates, but still versioned alongside icu4x). This smaller list is only really for things that don't have data to prevent version skew issues.
LGTM: @Manishearth, @sffc
@robertbastian Please review the above conclusions.
Compiled data requires that the fallback compiled data in locid transform is at the same version, otherwise deduplication breaks
That crate only gets the soft guarantee (and no semver relaxation): for me the line there is "still compiles and mostly works" (as in, it's an okay intermediate, temporary state to have). I think it would be good to explicitly call that out. This is one of the "version skew issues" that prompted me to propose having two separate lists, one a superset of the other.
provider/locid get the harder guarantee that we actually back by relaxing semver
"still compiles and mostly works" is horrible. We introduced the ~ deps because clients were having issues with mixed versions. If we allow non-~ locid_transform, there will be very silent behaviour changes that will be hard to debug.
I agree that locid and icu_provider can float more freely.
That's not the proposal.
The proposal is, two parts:
- icu_locid and icu_provider drop ~. Nobody else, except potentially some icu_util crate if we ever make that
- icu_locid, icu_provider, and icu_locid_transform (really, "whatever crate contains fallback") maintain a soft guarantee that we'll try and mitigate internal breakagaes. If the previous bullet point is part of our consensus (which seems to be the case) in practice this just means icu_locid_transform gets the soft guarantee since "no ~ dep" is a stronger guarantee anyway
I split it into two to better facilitate asynchronous decisionmaking.
maintain a soft guarantee that we'll try and mitigate internal breakagaes
I'm unclear what this refers to
With the soft guarantee, whenever we change internal doc-hidden/unstable APIs (say, in, icu_locid_transform) we ensure that downstream crates on older versions still compile with the new API, perhaps by keeping compat shims around. This is best-effort, so it's fine to violate this with a discussion. This guarantee is not exposed via our semver bounds, it is a soft internal guarantee that makes the process of vendoring ICU4X easier.
Updated recommendation:
icu_locale_coreandicu_providerstops being depended on with tilde deps and guarantee that internal APIs used by ICU4X obey semver guarantees or are updated in ways that do not break ICU4X usage. (We are okay with non-ICU4X users of internal APIs breaking)icu_bikeshed(#3931) continues to be depended on via~in ICU4X, however we provide a soft guarantee that ignoring the~temporarily may work.- Everyone else continues with
~deps
LGTM: @sffc @Manishearth
@robertbastian Do you approve the latest conclusion above?
Action on this ticket should be to document this somewhere and then close the issue.