`cargo vendor`'ing a `git` crate that includes a `workspace` misses important files
Problem
I created a Cargo.toml via cargo new hello, and tried to put a relibc dependency in the Cargo.toml, as below:
...
[dependencies.relibc]
git = "https://gitlab.redox-os.org/redox-os/relibc"
rev = "07ec3b6591878f23f3c4be80c26cbfc584abfe43"
When I run cargo build, the repository gets fetched and built as I hoped, but when I run cargo vendor, the generated vendor/relibc directory is missing important files and directories, such as https://gitlab.redox-os.org/redox-os/relibc/-/blob/master/src/ld_so/access.rs, and cargo build from the vendor/relibc directory fails.
Steps
New steps:
$ git clone https://github.com/aidancully/cargo-8885-requirer.git
$ cd cargo-8885-requirer
$ cargo check
$ mkdir .cargo
$ cargo vendor > .cargo/config.toml
$ cargo check
Old steps:
-
cargo new hello && cd hello - add the above dependency to
hello'sCargo.toml -
cargo vendor
At this point, I expect cd vendor/relibc && cargo build to work, but it does not. On the other hand, cargo build will successfully build the binary.
Possible Solution(s) I dug into this, the issue appears due to https://github.com/rust-lang/cargo/blob/64a468261aa05f2cd7a0d65dc1a6b4a576ed1938/src/cargo/sources/path.rs#L286-L297
The workspace module directories contain Cargo.toml files, causing these directories to get added to subpackages_found, and their files to be filtered out of the destination. The issue seems to be that this code wasn't updated to understand the fact that Workspaces allow several Cargo.toml's to participate in building a single package.
Notes
Output of cargo version:
cargo 1.49.0
I think that there may be a misunderstanding perhaps? I don't believe that the relibc is intended to be used as a Cargo dependency since I get
warning: The package `relibc` provides no linkable target. The compiler might raise an error while compiling `foo`. Consider adding 'dylib' or 'rlib' to key `crate-type` in `relibc`'s Cargo.toml. This warning might turn into a hard error in the future.
Additionally cargo build inside of the vendor directory builds the outer project, not the vendor directory (Cargo goes up a directory to find Cargo.toml).
Could you describe a bit more what you're trying to do here?
Sorry, I made some mistakes transcribing what I did from my work environment to here, I've edited the issue text to be more accurate.
relibc provides a staticlib crate type, see https://gitlab.redox-os.org/redox-os/relibc/-/blob/master/Cargo.toml#L7-9
cargo build works for relibc, but cargo vendor a project that uses relibc, and you can't cargo build the vendored directory.
As for what I'm trying to do, I'm basically trying to build a cdylib that doesn't contain any external library references (no libc.so, no libpthread.so) using Bazel, via cargo raze. After getting a proof-of-concept that interacted with Linux via the sc crate, I tried to raise the level of abstraction in my library by embedding relibc, and encountered this issue. I'll try using the musl target, but this still seems like an issue that should be fixed in cargo.
I think this may be a mismatch with Cargo and vendoring in that case? Vendoring is intended for Rust (rlib) dependencies, not for final end-products like executables/dylibs/staticlibs. Cargo doesn't have the concept of building a cdylib or staticlib dependency.
I'm pretty sure I can construct a failing example that generates an rlib, if that helps. Cargo isn't copying all the files it needs to, because it mistakenly believes that files in git repositories that live in the same directory as a (non-top-level) Cargo.toml are sub-packages, but in this case they are members of the top-level workspace.
Indeed yeah that would be helpful!
The problem is slightly different than I thought it was. Here's an example of a tree that causes the issue:
.
|-- Cargo.lock
|-- Cargo.toml
`-- src
|-- lib.rs
`-- member
|-- Cargo.toml
|-- mod.rs
`-- src
`-- lib.rs
src/member includes a Cargo.toml for building src/member/src, and a mod.rs that's a module for src/lib.rs.
https://github.com/aidancully/cargo-8885-requirer includes a Cargo.toml that uses a project structured as above. When you cargo build that, you'll see that it builds correctly. But cargo vendor, and follow the directions for building via the vendored repository, and the build will fail: the src/member/mod.rs that's technically part of src/lib.rs does not get copied into the vendor directory.
This matches what I observed with relibc.
cargo package fails for https://github.com/aidancully/cargo-8885-provider, though cargo build succeeds.
I think this is generally working as expected. Having overlapping packages like that isn't something Cargo really supports. I think the solution here is to ensure each package exists independently in its own subtree.
I can believe that it's working as more-or-less intended, but it's still surprising that cargo build can succeed when cargo package or cargo vendor fails, and the failure investigation was fairly painful. Ideally, cargo build, cargo package, cargo vendor etc. would all have the same idea of the files needed... One way might be to interact with rustc, and ask it what files it's compiling... Another might be to take a page from Bazel, and build in a sandbox that only refers to the same rust files that cargo package et al know about.. Or we might make the packaging less conservative, and allow it to include files from sub-packages, even if they don't get used downstream.
Or we might make the packaging less conservative, and allow it to include files from sub-packages, even if they don't get used downstream.
For real packages as the workspace root, this could cause significant bloat which is a concern to a good number of users.
Examples
- https://github.com/clap-rs/clap/
- https://github.com/rust-lang/cargo
One way might be to interact with rustc, and ask it what files it's compiling...
cargo vendor doesn't do any compilation. cargo package does but it is optional.
The best I can think of is if cargo check gets a Cargo lint (using #12235) that checks each file reported by rustc's dep info file to see if its directory has a Cargo.toml besides the root Cargo.toml (build.rs would report this, some others decide to put source in the root of their package). However, this likely wouldn't meet the bar for being an on-by-default lint and likely the people who could benefit from this the most wouldn't know to turn on this lint. If cargo publish was opt-in (#6153), then we could gate the lint on being a publishable crate. However, that would miss people doing cargo vendor and cargo publish and iiuc we generally don't have a lint's default level conditioned on something like publish = false. Another downside to this lint is this is usually a one-time setup issue while every run after will be paying for the overhead for checking it.
Correctness is not always binary. In this case it is a spectrum of tradeoff. Initially, Cargo made a strong assumption that every directory containing a Cargot.toml is a package. Then people figured out it didn't handle symlinks, so cargo package additionally copies symlink targets, at the cost of code complexity increase and packaging performance decrease.
For the project layout in https://github.com/rust-lang/cargo/issues/8885#issuecomment-735239611, Cargo can indeed handle that with eht dep-info file approach to maximize correct, with large performance degradation. Though, that doesn't achieve 100% correctness, because there might be edge cases like conditional #[path = "../outside/package/root/lib.rs"]. So Cargo would need to build every combination of features and target to figure out. This doesn't sound practical to me personally, as the layout is rare I believe. We should weigh the benefits, the costs, as well as where we want to lead the community towards.
I would propose to the team that we make clear that how cargo vendor works with git dependencies, which IMHO https://github.com/rust-lang/cargo/issues/7051#issuecomment-2874795755 is one of the most possible solutions — vendoring git deps as if they were cargo packaged. That should at least let users know what works what not.
@rustbot label +S-propose-close -S-triage
To clarify, the idea I was floating was to effectively create a lint that is run during development to report when a build unit references a file that is out of scope from being packaged. This could help a lot in catching packaging issues early. Yes, it would only apply for the platform and feature combinations being tested but that is true of so many other parts of Cargo.
To do the whole gitignore process could be slow. Depending on how aggressively we cache, checking parent directories for Cargo.toml may be slow. Checking against package.include would likely be "fast enough" but only covers a fraction of use cases.
There are enough caveats and risks here, I can see either letting this sit to collect feedback or proactively closing and re-opening if a situation warrants it.
Thanks. Marked it as S-needs-info` to collect more inputs.