cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Vendored crate with git submodule may have something wrong with gitignore

Open yukiiiteru opened this issue 1 year ago • 1 comments

Problem

When one crate (lib) uses jemalloc as a submodule and another crate (binary) uses that library, cargo vendor for the binary works, but some files with checksums are ignored by git.

If I run git clean -dfqx or push the repository and clone it to another location, the build fails.

I tried to reproduce it with my handwritten .gitignore but failed. Only jemalloc (with its gitignore) can reproduce the problem.

Steps

I've created repositories for reproducing it:

  • (lib): https://github.com/wfly1998/vendor-issue-reproduce-crate
  • (bin): https://github.com/wfly1998/vendor-issue-reproduce

There is jemalloc as submodule in the lib.

To reproduce the problem, you can use the above binary crate:

git clone https://github.com/wfly1998/vendor-issue-reproduce.git
cd vendor-issue-reproduce
cargo vendor > .cargo/config
git add .
git commit -m "test"
git clean -dfqx # or try `git clean -dfnx` to show the ignored files
cargo build --offline # fails

My outputs:

$ git clean -dfnx
Would remove target/
Would remove vendor/is_odd/jemalloc/test/stress/cpp/

$ git clean -dfqx

$ cargo build --offline
error: failed to calculate checksum of: vendor-issue-reproduce/vendor/is_odd/jemalloc/test/stress/cpp/microbench.cpp

Caused by:
  failed to open file `vendor-issue-reproduce/vendor/is_odd/jemalloc/test/stress/cpp/microbench.cpp`

Caused by:
  No such file or directory (os error 2)

Possible Solution(s)

No response

Notes

No response

Version

$ cargo version --verbose
cargo 1.79.0-nightly (2fe739fcf 2024-03-15)
release: 1.79.0-nightly
commit-hash: 2fe739fcf16c5bf8c2064ab9d357f4a0e6c8539b
commit-date: 2024-03-15
host: x86_64-unknown-linux-gnu
libgit2: 1.7.2 (sys:0.18.2 vendored)
libcurl: 8.6.0-DEV (sys:0.4.72+curl-8.6.0 vendored ssl:OpenSSL/1.1.1w)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Debian 10 (buster) [64-bit]

yukiiiteru avatar Mar 19 '24 14:03 yukiiiteru

Sorry for the late response.

This is an interesting case! Let me tell what happen.

In jemalloc, they have a .gitignore rule excluding everything under /test/stress/[A-Za-z]*. I believe this will exclude also directories, which you cannot use negate pattern to re-include files (according to gitignore specification)

/test/stress/[A-Za-z]*
!/test/stress/[A-Za-z]*.*

From my observation, cargo vendor successfully copied over vendor-issue-reproduce/vendor/is_odd/jemalloc/test/stress/cpp/microbench.cpp into the vendor directory. However, when git clean -dfqx was run, since the vendor-issue-reproduce/vendor/is_odd/jemalloc was no longer a git submodule, Cargo lost the git index information, so the file was clean because the .gitignore in jemalloc was not able to re-include.

I don't feel like this is a bug in Cargo, as we intentionally flatten the repo and remove git information when vendoring, so Cargo doesn't need to clobber user's gitconfig. However, since this situation, a file is re-added even was gitignored, is not uncommon, I am open to ideas making it more reasonable.

weihanglo avatar May 08 '24 22:05 weihanglo