cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Moving CARGO_HOME invalidates target caches

Open overdrivenpotato opened this issue 3 years ago • 5 comments

Problem

When the CARGO_HOME folder is moved to a new location, subsequent builds invalidate target folder caches because source file paths were updated. This is relevant in CI, where a cache folder can be placed in a new location for every build. If the path changes, there is no way to cache crates, so they must be rebuilt every time.

Steps

$ export CARGO_HOME=$(pwd)/home1
$ cargo new foo
     Created binary (application) `foo` package
$ cd foo
$ echo 'serde = "*"' >> Cargo.toml
$ cargo build
    Updating crates.io index
  Downloaded serde v1.0.140
  Downloaded 1 crate (76.4 KB) in 0.28s
   Compiling serde v1.0.140
   Compiling foo v0.1.0 (/private/tmp/repro/foo)
    Finished dev [unoptimized + debuginfo] target(s) in 2m 33s
$ cd ..
$ mv home1 home2
$ export CARGO_HOME=$(pwd)/home2
$ cd foo
$ cargo build
   Compiling serde v1.0.140
   Compiling foo v0.1.0 (/private/tmp/repro/foo)
    Finished dev [unoptimized + debuginfo] target(s) in 4.74s

Note that serde is built twice, after the CARGO_HOME folder is moved.

Possible Solution(s)

Perhaps it is possible to have the CARGO_HOME portion of crate build paths replaced with something that does not change?

Notes

No response

Version

No response

overdrivenpotato avatar Aug 01 '22 01:08 overdrivenpotato

That's because fingerprint take source file paths into account ^1. Currently, fingerprint calculation mostly relies on filesystem mtime and paths, not content hashes. Cargo never know the intent of a user changing CARGO_HOME, so it chooses to rebuild it all. There were some discussions about switching to content hash detection^2, but they never conclude. I feel like that content hashing approach can fix this issue, though I don't know where it will go 😞

Out of curious, could you share more about why swapping CARGO_HOME? Which CI does that?

weihanglo avatar Aug 14 '22 09:08 weihanglo

That's because fingerprint take source file paths into account 1. Currently, fingerprint calculation mostly relies on filesystem mtime and paths, not content hashes. Cargo never know the intent of a user changing CARGO_HOME, so it chooses to rebuild it all.

What if we instead hashed relative to CARGO_HOME? So long as everything else has stayed the same, we shouldn't need to worry about whether CARGO_HOME has changed I would think

epage avatar Aug 15 '22 14:08 epage

Which CI does that?

Concourse creates a new work directory with a random ID during every build on some setups, e.g. macOS workers. For example, one build may create /opt/concourse/work_dir/volumes/live/56383523-066e-4a62-77e2-c4c70c3fa52a/volume, only for the next build to be /opt/concourse/work_dir/volumes/live/258f073b-0051-4ae7-68f3-22b902a6e478/volume. Because the volume ID changed, attempting to cache CARGO_HOME inside the volume directory will not work for subsequent builds. It's also not a great solution to move the directory to a location outside of the volume as macOS-based workers don't use containers. Doing so would tamper with the rest of the system. However, currently it seems to be the only solution.

overdrivenpotato avatar Aug 15 '22 14:08 overdrivenpotato

What if we instead hashed relative to CARGO_HOME?

Personally I am happy towards this. My little concern is that someone already relies on switching CARGO_HOME for different registry index or other configurations to strike a level of reproducibility. However, if it is really a case, introducing more granular cache keys might be better instead of hashing CARGO_HOME.

weihanglo avatar Aug 27 '22 10:08 weihanglo

This is definitely a duplicate of https://github.com/rust-lang/cargo/issues/10179. Since that one was closed, I'll keep this open.


From https://github.com/rust-lang/cargo/issues/10179#issuecomment-992591557:

I believe this is correct behavior on behalf of Cargo right now because the full source path is used for debug information so it affects the final artifact. "Fixing" this issue would mean somehow doing something along the lines of remapping the paths to the same value.

https://github.com/rust-lang/cargo/issues/12137 -Ztrim-paths introduces a built-in remap mechanism in Cargo. The exact remap rules is under discussion in https://github.com/rust-lang/cargo/issues/13171. Debuginfo seems to be addressed soon, so IMO this should be re-considered.

My little concern is that someone already relies on switching CARGO_HOME for different registry index or other configurations to strike a level of reproducibility

I would tell me in the past that whoever depends on absolute CARGO_HOME is https://xkcd.com/1172/

weihanglo avatar Dec 15 '23 19:12 weihanglo

In #13171 we have an idea that a new subcommand like cargo debug generates remap rules for debuggers, so that debug info never contains a fixed absolute path and instead can always have a placeholder.

weihanglo avatar Jan 07 '24 03:01 weihanglo