Include cargo_vcs_info.json even for dirty working directories
Problem
Currently, Cargo includes a .cargo_vcs_info.json file when publishing, but only if the working copy is clean and --allow-dirty is not used.
However, this metadata is needed regardless whether the directory was modified or not. It's unlikely the path in repo would change, and it is still needed for resolving relative paths in README, creating links to the crate, and still helpful for quickly finding the crate in its repository.
Even having the commit hash is still useful, because it provides a base commit to diff from.
There are situations where it's still necessary to edit the Cargo.toml file, e.g. to publish crates with circular dev dependencies (#4242), or explicitly specified tests/benchmarks that are excluded from the tarball (#13456). In this case the commit hash is basically still accurate.
Note that Cargo omitting the commit hash when the directory is dirty is not a guarantee that packages with a commit hash are matching their commit. Dishonest users can always modify Cargo to make it lie or upload a manipulated tarball themselves, so this file is informational, not an integrity guarantee.
Proposed Solution
Cargo could always include the commit hash when available. Perhaps add "dirty":true if reporting that state is deemed important. Alternatively, provide an option of --allow-dirty-but-still-include-the-commit-hash.
Notes
No response
Is there a reason to be mutating Cargo.toml for dev-dependencies these days with the stripping of non-registry dev-dependencies?
For #13456, I am actively working on a fix. I posted the last major refactor today (#13693) before I can make the relevant change.
That said, I agree with the idea and it makes me wonder why this wasn't considered before. #5886 and its predecessor don't really say much on motivation. Same with #9866.
The biggest risk with this is if someone assumes the presence of cargo_vcs_info.json means the .crate is clean.
I talked to @joshtriplett to get some historical perspective and they don't see an issue and don't think that above use case should be large enough for us to work around (e.g. adding a cargo_vcs_info-2.json file)
https://github.com/tokio-rs/async-stream/blob/v0.3.5/async-stream/Cargo.toml#L23 https://docs.rs/crate/async-stream/0.3.5/source/Cargo.toml.orig
is an example of a crate that removed non-workspace crates-io dev-dep to avoid a cycle (tokio-test)
I'm not seeing tokio-test in the source Cargo.toml.
I'm also not too sure the point you are trying to get across. In theory, there isn't a reason to do that manually anymore. However, that is independent of whether we go forward with this and I've marked it as Accepted though I'm assuming we'd hold an FCP for it but I expect that to be relatively smooth.
The biggest risk with this is if someone assumes the presence of cargo_vcs_info.json means the .crate is clean.
Adding a "dirty" field would help here, unless they are just looking for the file (not the contents of the file. In which case there really isn't much you can do about it.
Another use case here is that of https://github.com/briansmith/ring/issues/1460#issuecomment-1830305112. TLDR: ring has a perl script(!) to generate assembly files, and also pre-compiles these for Windows so users don't need nasm to build ring. I can only assume that this predates stable support for inline assembly / global assembly in Rust. But I can imagine other similar use cases, where you want to pre-generate some data that requires exotic tools / takes an excessive amount of time to build. It's not ideal of course, but the real world rarely is.
Adding a "dirty" field would help here, unless they are just looking for the file (not the contents of the file. In which case there really isn't much you can do about it.
To clarify, I brought that up to talk about the impact on compatibility and not so much about people writing correct code in the future. All they have to go off of right now is the presence of the file.
To clarify, I brought that up to talk about the impact on compatibility and not so much about people writing correct code in the future. All they have to go off of right now is the presence of the file.
Makes sense, but:
- You already can't trust it today, I could patch my cargo to fake this (as @kornelski already pointed out in the original post). So it is not usable for security.
- If you want to use it, you should look at the revision mentioned in the file and use that (presumably in an advisory role, or as a starting point for verification to check that it actually matches the repo).
@page I'm not seeing tokio-test in the source Cargo.toml
Sorry, I've pasted a wrong link. It was meant to be async-stream 0.3.5 -> tokio-test -> tokio-stream -> async-stream https://github.com/rust-lang/cargo/issues/13695#issuecomment-2071319331
Thanks! That is a bit different because they aren't in the same repo and so not using a path dependency between them to allow breaking the cycle.
I wonder if that is something we could improve since dev-dependencies don't matter for the Index and so no one should be affected by this virtual cycle.
@taiki-e can you provide any insight into the async-stream publishing process? What is the direct problem cargo or crates.io is giving you that is leading to tokio-test to be stripped? As noted in my previous message, I'm wondering if there is a change that can be made to make this work.
https://github.com/tokio-rs/async-stream/issues/102#issuecomment-1946871459
I don't remember exactly what happened, but I suspect I could not publish without removing it due to circular dependencies.
https://github.com/tokio-rs/tokio/blob/b32826bc937a34e4d871c89bb2c3711ed3e20cdc/tokio-test/Cargo.toml#L22
(My understanding has not changed since my previous answer.)
So I think the issue with async-stream can be solved if https://github.com/rust-lang/cargo/issues/4242 is solved or a workaround that works for dev-dependencies that have a version is added (a known workaround doesn't work with such a case).
Adding a "dirty" field would help here, unless they are just looking for the file (not the contents of the file. In which case there really isn't much you can do about it.
Would love to see this. .cargo_vcs_info.json is not even useful for package provenance, but having dirty field can still help when browsing sources. At least people won't assume it is clean.
https://github.com/rust-lang/cargo/issues/8434 is also related I believe.