Feature selection in workspace depends on the set of packages compiled
Maintainers notes
- The recompilation was fixed, but this issue is still open regarding having features change based on what is being built simultaneously.
- The
cargo hackplugin will automatically expandcargo check --workspace(etc) tocargo check -p fail_test && cargo check -p lang_rust && ..., - The unstable config
resolver.feature-unification = "package"can control this behavior (https://doc.rust-lang.org/cargo/reference/unstable.html#feature-unification)
Reproduction:
-
Check out this commit: https://github.com/matklad/fall/commit/3022be495a8523ca43c3d78ab29966fbb5ee69fd
-
Build some test with
cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run -
Build other tests with
cargo test --all --verbose --no-run -
Run
cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-runagain and observe thatmemchrand some other dependencies are recompiled. -
Run
cargo test --all --verbose --no-runand observememchrrecompiled again.
The verbose flag gives the following commands for memchr:
Running `rustc --crate-name memchr /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/memchr-1.0.1/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="libc"' --cfg 'feature="use_std"' -C metadata=be49c4722e8b48bf -C extra-filename=-be49c4722e8b48bf --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-90ba32719d46f457.rlib --cap-lints allow -C target-cpu=native`
Running `rustc --crate-name memchr /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/memchr-1.0.1/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="libc"' --cfg 'feature="use_std"' -C metadata=be49c4722e8b48bf -C extra-filename=-be49c4722e8b48bf --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-335251832eb2b7ec.rlib --cap-lints allow -C target-cpu=native`
Here's the single difference:
--extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-90ba32719d46f457.rlib
--extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-335251832eb2b7ec.rlib
Versions (whyyyyy cargo is 0.21 and rustc is 1.20??? This is soo confusing)
λ cargo --version --verbose
cargo 0.21.0 (5b4b8b2ae 2017-08-12)
release: 0.21.0
commit-hash: 5b4b8b2ae3f6a884099544ce66dbb41626110ece
commit-date: 2017-08-12
~/trash/fall master
λ rustc --version
rustc 1.20.0 (f3d6973f4 2017-08-27)
So, it has to do with features. Namely, two cargo invocations produce two different libcs:
Running `rustc --crate-name libc /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/libc-0.2.30/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="use_std"' -C metadata=335251832eb2b7ec -C extra-filename=-335251832eb2b7ec --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --cap-lints allow -C target-cpu=native`
Running `rustc --crate-name libc /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/libc-0.2.30/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="use_std"' -C metadata=90ba32719d46f457 -C extra-filename=-90ba32719d46f457 --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --cap-lints allow -C target-cpu=native`
The only difference is --cfg 'feature="default"'.
So, I get two different libcs in target:
λ ls target/debug/deps | grep liblibc
.rw-r--r-- 982k matklad 3 Sep 14:06 liblibc-90ba32719d46f457.rlib
.rw-r--r-- 982k matklad 3 Sep 14:03 liblibc-335251832eb2b7ec.rlib
But I get a single memchr:
λ ls target/debug/deps | grep libmemchr
.rw-r--r-- 186k matklad 3 Sep 14:09 libmemchr-be49c4722e8b48bf.rlib
The file name is the same for both cargo commands, but the actual contents differs.
Hm, so this looks like more serious then spurious rebuild!
Depending on what -p options you pass, you might end up with different final artifacts for the same package. This should not happen, right?
Minimized example here: https://github.com/matklad/workspace-vs-feaures
@alexcrichton continuing discussion here, instead of #4469 which is somewhat orthogonal, as you've rightly pointed out!
I don't think this'd be too hard to implement, but I'm not sure if this is what we'd want implemented per se. If one target of a workspace doesn't want a particular feature activated, wouldn't it be surprising if some other target present in a workspace far away activated the feature?
Yeah, it looks like what we ideally want here is that each final artifact gets the minimal set of features. And this should work even withing a single package: currently, activating feature in dev-dependecy will activate it for usual dependency as well. This is also something to keep in mind if we go the route of binary-only (or per-target) dependencies.
Though such fine-grained feature activation will cause more compilation work overall, so using union of featues might be a pragmatic choice, as long as we keep features additive, and it sort of makes sense, because crates in workspace share dependencies anyway. And seems better then definitely some random unrelated target activating features for you depending on the command line flags.
I think one of the main problems right now is that we're doing feature resolution far too soon, during the crate graph resolution. Instead what we should be doing is assuming all features are activated until we actually start compiling crates. That way if you have multiple targets all requesting different sets of features they'll all get separately compiled copies with the correct set of features.
Does that make sense? Or perhaps solving a different problem?
Does that make sense? Or perhaps solving a different problem?
Yeah, totally, "they'll all get separately compiled copies with the correct set of features" is the perfect solution here, and it could be implemented by moving feature selection after the dependency resolution.
But I am really worried about additional work to get separately compiled copies, because it is multiplicative. Let's say you have a workspace with the following layout:
- leaf crates A and B, which transitively depend on external crate libc with different features
- A large number of intermediate crates, on which A and B also depend
- An ubiquitous utils crate, that depends on libc and is a dependency of any other crate.
Because A and B require different features from libc, and because libc happens to be at the bottom of the dependency graph, that means that for cargo build --all we will compile every crate twice. Moreover, editing utils and then doing cargo build --all again recompiles everything two times.
So it's not that only libc will get duplicated, the whole graph may be duplicated in the worst case.
If we assume that features are additive (as intended), then the innermost crate could be compiled once with the union of all features.
Additive features are a bit of a subtle point though (see #3620). Recompiling is the safest way, though expensive.
@matklad yeah you're definitely right that the more aggressively we cache the more we end up caching :). @nipunn1313 you're also right that it should be safe for features to be unioned, but they often come with runtime or linkage implications. For example if a workspace has a no_std project and an executable, compiling both you wouldn't want to enable the standard library in the dependencies of the no_std project by accident!
I basically see this as there's a specification of what Cargo should be doing here. We've got, for example, two crates in a workspace, each which activates various sets of features in shared dependencies. Today Cargo does the "thing that caches too much" if you compile each separately (and also suffers a bug when you switch between projects it recompiles too much). Cargo also does the "union all the features" if you build both crates simultaneously (e.g. cargo build --all). Basically Cargo's not consistent!
I'd advocate that Cargo should try to stick to the "caches too much" solution as it's following the letter of the law of what you wrote down for a workspace. It also means that crates in a workspace don't need to worry too much about interfering with other crates in a workspace. Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.
Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.
This somewhat resolves my concern about build times, but not entirely. I am worried that it might not be easy to unify features manually, if they are turned on by private transitive dependencies. It would be possible to do by adding this private transitive dependency as an explicit and unused dependency, but this looks accidental.
But now I too lean towards fine-grained features solution.
For what it's worth, we've done that exact trick with the parallel feature of the gcc crate. It does happen, but the workaround is ok.
On Wed, Sep 6, 2017 at 12:45 AM Aleksey Kladov [email protected] wrote:
Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.
This somewhat resolves my concern about build times, but not entirely. I am worried that it might not be easy to unify features manually, if they are turned on by private transitive dependencies. It would be possible to do by adding this private transitive dependency as an explicit and unused dependency, but this looks accidental.
But now I too lean towards fine-grained features solution.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/cargo/issues/4463#issuecomment-327403064, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPXoxPIsKCCcH5DEgqtKzPt9ek34uLeks5sfk2EgaJpZM4PLGrK .
Servo relies on the current behavior to some extent: two "top-level" crates (one executable and one C-compatible static library) depend on a shared library crate but enable different Cargo features. These features are mutually exclusive, enabling the union would not work.
Maybe the "right" thing to do here is to have separate workspaces for the different top-level things? Does it make sense for shared path dependencies to be members of two separate workspaces?
(Servo’s build system sets $CARGO_TARGET_DIR to different directories for the two top-level things so that they don’t overwrite each other. They also happen to be built with different compiler versions (some nightly v.s. some stable release).)
I would be in support of cargo build --all building the same dependency multiple times rather than resolving a feature union. This would be equivalent of running cargo build in a loop over each crate in the workspace. This would prevent multiple crates within a workspace from interfering with each other (or multiple dependencies in the dep-chain interfering). I believe it would cover Servo's case as well.
What makes this problem so insidious is that there's no way to enforce or even encourage the union property of features. If a project pulls in even one dependency that doesn't obey this property, it could potentially create an incorrect binary.
In @SimonSapin's case with Servo, I think Servo is lucky that the feature'd crate (style) is only one-level in from the top level crate. If you had a dep chain like
evenbiggerproject -> servo -> style[featA]
-> geckolib -> style[featB]
then I believe that compiling evenbiggerproject with cargo would select the union of features for style and use it for both geckolib and servo. This would be an incorrect binary w.r.t. the intent of the servo/geckolib Cargo.tomls
Our project at Dropbox ran into a similar issue with itertools -> libeither, where libeither was compiled with two different features. Lucky for us, libeither's features are union-safe, so the code was correct, but it did create spurious recompiles depending on which sub-crate we were compiling.
I agree with @nipunn1313 -- I think cargo build --all should build all crates exactly as they would be if you had run cargo build for each crate separately. If that requires us to recompile some crates, so be it.
This all sounds like agreement on what should happen. @alexcrichton, what code changes need to happen (on a high level) to get there?
That's what I was discussing with @alexcrichton at the RustFest impl days, and I have a bunch of refactoring done that I'm still tweaking. Will post a PR ASAP. Do you have a particular dependency/urgency relating to Gecko or Servo on this?
Nothing urgent. I thought this bug could cause spurious rebuilds after selectively building a crate with -p, but I couldn’t reproduce. Anyway, thanks for working on this!
We've had to fork some deps to unify feature selection to work around this issue. It's definitely not sustainable for us, but not urgent yet.
--Nipunn
On Sat, Oct 14, 2017 at 5:27 AM Simon Sapin [email protected] wrote:
Nothing urgent. I thought this bug could cause spurious rebuilds after selectively building a crate with -p, but I couldn’t reproduce. Anyway, thanks for working on this!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rust-lang/cargo/issues/4463#issuecomment-336631887, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPXo7HL7OuZhSaMgZMtY9Y5IxnY7dHFks5ssKjIgaJpZM4PLGrK .
@nipunn1313 for my understanding, can you point me at a commit or otherwise elaborate on what problems you've had due to this issue?
Here's an example of a problem we had to work around https://github.com/rust-lang/cargo/issues/3620#issuecomment-326462000
In that particular case, either and itertools were both present in our workspace. We ended up internally forking itertools to ask for a wider set of features from libeither, so there was consistency across the workspace.
@SimonSapin taking on this issue will require a relatively significant refactoring of Cargo's backend. Right now feature resolution happens during crate graph resolution, but we need to defer it all the way until the very end when we're actually compiling crates.
Today, I encountered a problem with the overall compilation of the workspace and the inconsistent compilation of each crate.
We have a public logger crate, the default is output to a file, open console features will be output to the terminal.
Some crates in the workspace have this feature turned on, while others do not need to be turned on.
But cargo bulid --all will cause the entire workspace's crates to be output to the terminal.
This behavior makes me very confused.
Regarding:
Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.
- What would be the best way to figure out which features needs to be unified?
I hacked together a crude tool that compares output of multiple cargo build --build-plan -Z unstable-options --package XYZ runs to see if compiler would be invoked with a different set of features for the same version of dependency for different packages. Is there a better way to do it?
- Would it be feasible to ask for a feature to "lock" features on the workspace level, similar how versions are locked via Cargo.lock? Something along the lines (in the workspace
Cargo.toml):
[enable-features] # some better name
# for all the dependencies of `syn` in the workspace matching version `0.15`,
# forcefully enable "visit-mut" and "extra-traits" features.
syn = { version = "0.15", features = ["visit-mut", "extra-traits"] }
?
This is pretty much to avoid crude solution I outlined here: https://users.rust-lang.org/t/forcing-set-of-features-for-shared-dependencies-across-targets-in-workspace/25899 (I don't really want to have "fake" dependencies or tune features across multiple Cargo.toml files).
- So far I found two use-cases for feature unification:
First, it improves compilation times (as explained in my post on users.rust-lang.org).
Second, I found that inconsistent features break the following use-case I have (which might be a not fully valid one?):
sharedcrate is compiled asdylibplugincrate is compiled withsharedas a dependency;pluginitself isdylibappcrate is compiled as a binary withsharedas a dependency; NO direct dependency onplugin
app tries to load plugin at runtime via libloading library.
So, when features are consistent on shared between two targets (plugin and app), everything works fine. I can build plugin and app independently, start app and it will load plugin. Code is shared between these two in the form of libshared.dylib dynamic library.
However, when features are not consistent (say, plugin=>shared=>another_dep enables some feature A in another_dep and app=>shared=>another_dep does not), this causes plugin to fail loading because some symbols would be missing in libshared.dylib. Also, it seems like a single version of libshared.dylib is created for both targets, so building app & plugin independently would constantly override that libshared.dylib.
It seems like dynamic libraries is still an unfinished story in Rust, so I don't know what really I could be asking here (it might be that it "works" only accidentally?).
Because winapi is usually such a deep transitive dependency and has so many unique ways to select which features to enable, this problem is a particular nuisance for us Windows users because winapi gets rebuilt every time we switch which package we're building. This is most infuriating with cargo run because we need -p to select which package to run and it causes deep rebuilds all the time.
^ we have the same issue with winapi in particular. We worked around it by autogenerating our Cargo.toml to always specify the exact union of winapi features which we require (including all transitive dependencies) as a direct dependency of every Cargo.toml. There are a few other cases in which it's an issue.
It's a painful workaround for an even more painful issue.
I find myself in need of this as well.
There was a recent discussion on Zulip between @ehuss, myself, and several others, which came to the following rough conclusion:
- It should be possible to ask cargo to build all (or a subset of) the crates of a workspace with independent feature unification, such that each of the workspace's crates build their dependencies with only the minimal set of features required to satisfy that crate and its dependencies, not the whole workspace.
- There should be a top-level option for a workspace's
Cargo.tomlto opt into that behavior, such thatcargo buildorcargo build -p ...will always have that behavior.
@joshtriplett it sounds like you want feature selection to be dependent on the set of packages compiled? That's the opposite of what this bug is about.
If what this bug is about has changed over the years, maybe a moderator could update the title & summary?
I'm not a mod - but I've been here following with this task for 3+ years now. Here's a writeup of my summary - from reading through this.
- It should be possible to ask cargo to build all (or a subset of) the crates of a workspace with independent feature unification, such that each of the workspace's crates build their dependencies with only the minimal set of features required to satisfy that crate and its dependencies, not the whole workspace.
- There should be a top-level option for a workspace's
Cargo.tomlto opt into that behavior, such thatcargo buildorcargo build -p ...will always have that behavior.
Sounds like there are two modes of operating on the table
Crate Specific Feature Selection (CSF):
Each crate in a workspace is built with its deps (incl vendored) compiled with the minimal features needed for just that crate
Pros
- Each individual crate's output has minimal bloat
Cons
- Switching between crates in a workspace may incur long recompiles of dependency chains - and a bloated cargo dependency cache
cargo buildof each crate independently may be costly - recompiling indvidiual dependencies (eg vendored crates) many times - w/ different feature sets each time
Workspace Aware Feature Selection (WAF)
Each crate in a workspace is built with its deps having the union of features needed for the whole workspace
Pros
- Switching between crates within a workspace preserves good build caching in developer environment
Cons
- Each crate's output may be bloated - w/ dependencies it doesn't need
- Relies on additive feature correctness (each crate w/ features A and B, must compile/run correctly for feature A, regardless of whether feature B is present). Additive feature correctness is hard to enforce.
Today, we're awkwardly in between, causing a lot of recompiles and some confusion:
cargo build inside a crate within a workspace uses "CSF"
cargo build --all or cargo build from workspace root uses "WAF"
Ideas/Proposals
A) Default CSF
- Implement CSF by default even if building from workspace root or using
cargo build --all - Provide option in workspace Cargo.toml to switch to WAF
B) Default WAF
- Implement WAF by default even if building from a crate within a workspace
- Provide option in workspace Cargo.toml to switch to CSF
C) Keep today's behavior - CSF in crates, WAF at root
- Document the behavior dichotomy better
My personal thoughts. I think (C) is a bad option - as it's leading to confusion.
W.r.t developer environment within workspace, WAF seems better for developer build times when switching crates often. CSF seems better when focusing on one crate.
W.r.t. output bloat CSF seems better if the crates are independent. WAF seems better if the crates are meant to be used together (eg one primary crate - w/ several subcrates within the workspace).
Hopefully this is helpful!
I would agree that the workspace-aware unification is probably the right default, personally. When this is combined with -Zfeatures=all from today it may actually mean that basically no one wants crate-specific-features perhaps? We could always perform a transition like this gradually and switch the defaults during an edition or something like that.
For your option (A), though, I'm not sure if we could actually implement that or if it would have the desired effect. We need to unify shared dependencies somehow, and if we ended up doing separate feature resolution for crates that would cause just as many rebuilds as if you did cargo build in each workspace member. It would also be unfortunately pretty difficult to implement with Cargo's current architecture without some deeper refactorings I think.
I haven't read all comments so maybe this has already been noted: If we have dependencies a -> b -> c and a -> c. And a and b depend on feature c/x, then b will not compile if its Cargo.toml does not activate c/x. But if a activates c/x then a will compile because it implicitly activates c/x for its dependency b. This can be somewhat confusing.
I'm currently implementing the "workspace-aware unification" model described above using guppy, see the documentation for hakari for more details. We're about to ship it in Diem Core. Seems to work well for our use case though there are definitely issues around dependency bloat that we're still working through and iterating on. (We may have to eventually go with a more fine-grained approach where we unify features for certain subsets of the workspace differently, but we expect to do all this through the guppy toolset without involving Cargo.)