cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Build cache gets stuck in permanent dirty state

Open xldenis opened this issue 2 months ago • 11 comments

Problem

At work, our main workspace gets stuck in a permanently dirty state where only a full cargo clean allows it to fix itself.

It appears like cargo is of two minds whether certain dependencies are stale: when evaluating depA it thinks its fresh but when looking at crate1 which uses depA it reports that its Dirty causing a whole chain of crates to get perpetually rechecked / rebuilt.

The problem is that the cache never gets updated / the affected crates never get rebuilt so followup runs reach the same conclusion and will always rebuild the affected crates.

Steps

  1. Run cargo build or cargo check on affected workspace
  2. Repeat
  3. Run cargo clean
  4. cargo build / cargo check now works as expected.

I haven't been able to reproduce this outside of our work repository yet so I'm unable to share a MVCE at this time.

Possible Solution(s)

No response

Notes

I've included a filtered output of cargo check --verbose.

       Fresh hyper-util v0.1.14

       Fresh rkyv v0.7.44

       Fresh crc32fast v1.4.2

       Fresh tokio v1.45.0

       Dirty <workspace>-a v0.1.0 (PATH): the dependency <workspace>_proc was rebuilt (1761733737.968668037s, 286h 26m 15s after last build at 1760702562.265344617s)
    Checking <workspace>-a v0.1.0 (PATH)

       Fresh <workspace>-c v0.0.1 (PATH)

       Dirty <workspace>-d v0.1.0 (PATH): the dependency crc32fast was rebuilt (1760620769.157896236s, 23h 43m 9s after last build at 1760535380.652542855s)
    Checking <workspace>-d v0.1.0 (PATH)

       Fresh serde-value v0.7.0

       Dirty <workspace>-e v0.0.1 (PATH): the dependency rkyv was rebuilt (1760702561.999607786s, 45h 49m 16s after last build at 1760537605.018571092s)
    Checking <workspace>-e v0.0.1 (PATH)

       Dirty <workspace>-f v0.0.1 (PATH): the dependency tokio was rebuilt (1760620770.329166748s, 23h 43m 1s after last build at 1760535389.035129015s)
    Checking <workspace>-f v0.0.1 (PATH)

       Dirty <workspace>-g v0.1.0 (PATH): the dependency hyper_util was rebuilt (1760620773.396682246s, 23h 43m 4s after last build at 1760535389.804449249s)
    Checking <workspace>-g v0.1.0 (PATH)

       Dirty ptree v0.1.0 (PATH): the dependency serde_value was rebuilt (1760620770.810881011s, 23h 43m 1s after last build at 1760535389.857334782s)
    Checking ptree v0.1.0 (PATH)

       Fresh <workspace>-proc v0.0.1 (PATH)
       Dirty <workspace>-h v0.1.0 (PATH): the dependency <workspace>_a was rebuilt
    Checking <workspace>-h v0.1.0 (PATH)
       Dirty <workspace>-i v0.1.0 (PATH): the dependency <workspace>_d was rebuilt
    Checking <workspace>-i v0.1.0 (PATH)
       Dirty <workspace>-j v0.1.0 (PATH): the dependency <workspace>_d was rebuilt
    Checking <workspace>-j v0.1.0 (PATH)
       Dirty <workspace>-k v0.1.0 (PATH): the dependency <workspace>_a was rebuilt
    Checking <workspace>-k v0.1.0 (PATH)
       Dirty <workspace>-l v0.1.0 (PATH): the dependency <workspace>_i was rebuilt
    Checking <workspace>-l v0.1.0 (PATH)
    ....
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 5.77s

One interesting tidbit is that a proc macro crate seems to be involved (at least in this specific instance of the problem). The same output is produced by followup check runs so the cache doesn't get fixed. Note: even running cargo build does nothing to fix the cache.

Another note: it seems like there might be an interaction with git? it seems like this only manifests when switching between git branches.

Version

1.90.0-nightly (840b83a10 2025-07-30)
release: 1.90.0-nightly
commit-hash: 840b83a10fb0e039a83f4d70ad032892c287570a
commit-date: 2025-07-30
host: aarch64-apple-darwin
libgit2: 1.9.1 (sys:0.20.2 vendored)
libcurl: 8.7.1 (sys:0.4.82+curl-8.14.1 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 3.5.0 8 Apr 2025
os: Mac OS 15.6.1 [64-bit]

xldenis avatar Oct 29 '25 10:10 xldenis

In trying to create a reproduction case, check what build scripts are involved and what they are doing. #16104 is one example of a known issue. You can see more at https://github.com/rust-lang/cargo/issues?q=state%3Aopen%20label%3A%22A-rebuild-detection%22

epage avatar Oct 29 '25 11:10 epage

That issue does look quite similar!

We only have one build script in our final leaf / binary crate, would that be likely to cause any issues? It also doesn't do anything on macos:

fn main() {
    #[cfg(target_os = "linux")]
    custom_labels::build::emit_build_instructions();

    println!("cargo:rerun-if-changed=build.rs");
}

I'll try to see if I can reproduce in a smaller / shareable workspace.

xldenis avatar Oct 29 '25 11:10 xldenis

That shouldn't affect dependencies

epage avatar Oct 29 '25 11:10 epage

Edit: think I have a grasp on what is happening in my case see Xsf issue section at the end

Hitting the same issue and mistakenly posted in https://github.com/rust-lang/cargo/issues/15716#issuecomment-3472809953

It does survive a cargo clean however

I hit the issue in a large repo, I do not hit the issue in a small repro project with only the xsf dependency, I tried to monitor what's happening to the file but seems only the build script is accessing writing to the file...

Reproducing the comment here:

Reproduced Comment

I'm hitting an issue on the xsf-rust project https://github.com/jorenham/xsf-rust

where the build script generates a file in the out_dir, and then it detects this file as being newer than the last build invokation and so re-runs the build script every time. It should be fine to write in out_dir I believe https://doc.rust-lang.org/cargo/reference/build-scripts.html#outputs-of-the-build-script

example log :

       Dirty xsf v0.3.2+0.1.3 (/home/redacted/Documents/redacted/code/xsf-rust): the file `target/debug/build/xsf-83bc54503dc402a6/out/xsf_wrapper.hpp` has changed (1761911758.276305173s, 20000000ns after last build at 1761911758.256305173s)

I would think given the build script runs and then generates the file, the out_dir content would always be dirty ?

Edit: Tried a smaller repro on a very simple project, and it's not triggering the issue at the moment Edit2: I can consistently repro on a bigger project, could it be a sort of granularity issue on last modified time coupled with a "time of check/time of use" kind of issue ?

something like : run build, record build start, build takes time, the recorded time for the out_dir will be ever so slightly later than the build start, compare out_dir modified time vs last build start : the out_dir has been modified after ? I'm guessing recording the "build end time" might make more sense ?

a repro might look like "have a build.rs script with an arbitrarily slow build process == time sleep, and output a random file in out_dir" will try that

Edit3: no luck with a slow build script, so something is changing the last modified time for whatever reason...

End Reproduced comment

edit: seems I got a very weird mtime difference on one run :

Dirty xsf v0.3.1+0.1.3: the file `target/debug/build/xsf-4c6d9c9995e76790/out/xsf_wrapper.hpp` has changed (1761917263.404304699s, 1m 25s after last build at 1761917178.208304859s)

I closed vs code, no rust analyzer running, did a cargo clean first

Full log:

https://github.com/zama-ai/tfhe-rs/commit/739da8046b57a5134a14a9a567c4705014dd209d

command to re-run and causing an issue:

CARGO_LOG=cargo::core::compiler::fingerprint=trace,cargo_util::paths=trace RUSTFLAGS="-C target-cpu=native" cargo test --tests -p tfhe --verbose -- test_bound_validity --nocapture

xsf_build.log

cargo -Vv
cargo 1.91.0 (ea2d97820 2025-10-10)
release: 1.91.0
commit-hash: ea2d97820c16195b0ca3fadb4319fe512c199a43
commit-date: 2025-10-10
host: x86_64-unknown-linux-gnu
libgit2: 1.9.1 (sys:0.20.2 vendored)
libcurl: 8.15.0-DEV (sys:0.4.83+curl-8.15.0 vendored ssl:OpenSSL/3.5.2)
ssl: OpenSSL 3.5.2 5 Aug 2025
os: Ubuntu 22.4.0 (jammy) [64-bit]

Edit again : seems it sees the build script itself as dirty, but it does not get rebuilt when checking modified time from the shell

   0.909105592s  INFO prepare_target{force=false package_id=xsf v0.3.1+0.1.3 target="xsf"}: cargo::core::compiler::fingerprint:     dirty: FsStatusOutdated(StaleDepFingerprint { name: "build_script_build" })

Edit2: suspiciously I see this in the output file:

cargo:rerun-if-changed=/home/redacted/Documents/redacted/code/tfhe-rs/target/debug/build/xsf-4c6d9c9995e76790/out/xsf_wrapper.hpp

and I can't find the code generating this rerun-if-changed clause

Edit3: ok seems to be bindgen that actually emits those with some CargoCallbacks default if used by people providing rust bindings to C/C++ libraries https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md#0690-2023-11-01

xsf issue

xsf generates a header file in the OUT_DIR and bindgen marks this header as rerun-if-changed through a callback, I'm guessing the granularity of mtime on my machine makes it so that on a small project we can't see the mtime change between the build start and the moment the header file is generated so they look "in sync", while it may show up on a large project because of the machine being busy and writing the file at least "one unit of mtime later" making it seem like they are out of sync

IceTDrinker avatar Oct 31 '25 13:10 IceTDrinker

Just throwing my 2 cents in here, I had been experiencing this issue while building inside WSL with exactly the same symptoms with weird mtimes. Running wsl --update made it go away. I went from version 2.3.26.0 with linux kernel version 5.15.167.4-1 to version 2.6.1.0 with kernel 6.6.87.2-1. This issue initially appeared for me with Rust 1.90, and persisted with 1.91, until the wsl update made it go away. Maybe some change in 1.90 triggered some latent kernel issue? Unclear, but if anyone else is seeing this inside WSL (or other linuxes) maybe try updating things?

smalis-msft avatar Nov 07 '25 18:11 smalis-msft

FYI I've been struggling with an issue like this as well, but I notice that if I keep separate target directories for manual cargo commands and rust-analyzer, it goes away.

In vscode, I added:

"rust-analyzer.cargo.targetDir": "${workspaceFolder}/target_rust_analyzer",

to my .vscode/settings.json

And then did a single cargo clean, and now it doesn't seem to happen anymore

dspyz-matician avatar Nov 13 '25 19:11 dspyz-matician

I've been running with "rust-analyzer.cargo.targetDir": true, since that setting was released, and it didn't stop this issue from happening to me.

smalis-msft avatar Nov 13 '25 21:11 smalis-msft

Note that there are likely several rebuild detection issues and the experience of one person does not necessarily weigh in on what is happening with another. This is a reason we've tended to encourage people to use separate issues. However, that also has downsides. I'm exploring alternatives at #t-cargo > Issues with common symptons with many root causes

epage avatar Nov 13 '25 21:11 epage

@xldenis Try turn off incremental compilation for a while (env CARGO_INCREMENTAL=0 or build.incremental = false in config). If with that it never happens, that may suggest it is the same instance of #16104

weihanglo avatar Nov 14 '25 10:11 weihanglo

If with that it never happens, that may suggest it is the same instance of https://github.com/rust-lang/cargo/issues/16104

Setting incremental = false immediately stopped the issue. I'll keep running and see if it re-occurs.

What seems different from #16104 is that the crates that get marked as dirty are often random dependencies far up the tree, not crates from the workspace.

xldenis avatar Nov 16 '25 12:11 xldenis

Setting incremental = false immediately stopped the issue. I'll keep running and see if it re-occurs.

What seems different from #16104 is that the crates that get marked as dirty are often random dependencies far up the tree, not crates from the workspace.

That is strange as non-local packages never have incremental compilation enabled https://github.com/rust-lang/cargo/blob/2a7c4960677971f88458b0f8b461a866836dff59/src/cargo/core/profiles.rs#L309-L317

epage avatar Nov 25 '25 22:11 epage