Vendored crate with git submodule may have something wrong with gitignore
Problem
When one crate (lib) uses jemalloc as a submodule and another crate (binary) uses that library, cargo vendor for the binary works, but some files with checksums are ignored by git.
If I run git clean -dfqx or push the repository and clone it to another location, the build fails.
I tried to reproduce it with my handwritten .gitignore but failed. Only jemalloc (with its gitignore) can reproduce the problem.
Steps
I've created repositories for reproducing it:
- (lib): https://github.com/wfly1998/vendor-issue-reproduce-crate
- (bin): https://github.com/wfly1998/vendor-issue-reproduce
There is jemalloc as submodule in the lib.
To reproduce the problem, you can use the above binary crate:
git clone https://github.com/wfly1998/vendor-issue-reproduce.git
cd vendor-issue-reproduce
cargo vendor > .cargo/config
git add .
git commit -m "test"
git clean -dfqx # or try `git clean -dfnx` to show the ignored files
cargo build --offline # fails
My outputs:
$ git clean -dfnx
Would remove target/
Would remove vendor/is_odd/jemalloc/test/stress/cpp/
$ git clean -dfqx
$ cargo build --offline
error: failed to calculate checksum of: vendor-issue-reproduce/vendor/is_odd/jemalloc/test/stress/cpp/microbench.cpp
Caused by:
failed to open file `vendor-issue-reproduce/vendor/is_odd/jemalloc/test/stress/cpp/microbench.cpp`
Caused by:
No such file or directory (os error 2)
Possible Solution(s)
No response
Notes
No response
Version
$ cargo version --verbose
cargo 1.79.0-nightly (2fe739fcf 2024-03-15)
release: 1.79.0-nightly
commit-hash: 2fe739fcf16c5bf8c2064ab9d357f4a0e6c8539b
commit-date: 2024-03-15
host: x86_64-unknown-linux-gnu
libgit2: 1.7.2 (sys:0.18.2 vendored)
libcurl: 8.6.0-DEV (sys:0.4.72+curl-8.6.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w 11 Sep 2023
os: Debian 10 (buster) [64-bit]
Sorry for the late response.
This is an interesting case! Let me tell what happen.
In jemalloc, they have a .gitignore rule excluding everything under /test/stress/[A-Za-z]*. I believe this will exclude also directories, which you cannot use negate pattern to re-include files (according to gitignore specification)
/test/stress/[A-Za-z]*
!/test/stress/[A-Za-z]*.*
From my observation, cargo vendor successfully copied over vendor-issue-reproduce/vendor/is_odd/jemalloc/test/stress/cpp/microbench.cpp into the vendor directory. However, when git clean -dfqx was run, since the vendor-issue-reproduce/vendor/is_odd/jemalloc was no longer a git submodule, Cargo lost the git index information, so the file was clean because the .gitignore in jemalloc was not able to re-include.
I don't feel like this is a bug in Cargo, as we intentionally flatten the repo and remove git information when vendoring, so Cargo doesn't need to clobber user's gitconfig. However, since this situation, a file is re-added even was gitignored, is not uncommon, I am open to ideas making it more reasonable.