crane icon indicating copy to clipboard operation
crane copied to clipboard

Advice on how to debug errors that aren't reproducible in devshell

Open cameron1024 opened this issue 1 year ago • 8 comments

This is a bit of a vague question, so apologies in advance if it's considered off-topic.

I'm trying to use crane to build a project at $JOB, which is quite large and closed source, and I'm getting a strange issue where nix build reports the following error:

       > ++ command cargo check --release --locked --all-targets
       > error: the lock file /build/source/Cargo.lock needs to be updated but --locked was passed to prevent this
       > If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead.

However, when I enter the devshell and run cargo check --release --locked --all-targets, it succeeds. I do notice that running cargo generate-lockfile does update the lockfile, but I wouldn't have thought that affects things.

I can't share the source, and there's >1k crates in the crate graph, so pinning down exactly what's causing the issue isn't quick, but I was wondering if there are general guidelines on how to debug these kinds of issues?

Thanks :grin:

cameron1024 avatar Mar 17 '24 21:03 cameron1024

Hi @cameron1024 it sounds like you have made a change to a Cargo.toml somewhere but did not stage/commit the accompanying change to Cargo.lock.

do notice that running cargo generate-lockfile does update the lockfile, but I wouldn't have thought that affects things. If the change added a new dependency it absolutely does change things (we use the lock file to download the exact sources needed by the build before cargo is run inside of the sandbox).

There is a more "benign" version of these types of change (e.g. one workspace crate depending on another) which don't need network access, however, we default to running builds with cargo build --locked to catch the previous case more easily.

Commit the results of cargo generate-lockfile and you should be good to go!

ipetkov avatar Mar 19 '24 22:03 ipetkov

Hmm, I don't think I have any uncommitted changes to Cargo.toml - I get the same error if I run git reset --hard HEAD (or if I just delete the repo and clone it again).

My understanding about cargo generate-lockfile is that it will update packages to more recent (semver-compatible) versions, essentially deleting the lockfile and starting again. Unfortunately, that's not possible for me, since:

  • running cargo generate-lockfile pulls in more recent versions of dependencies that I can't use, since I'm stuck on rustc 1.68 :smiling_face_with_tear:
  • we care quite a lot about the specific versions of some of the dependencies because parts of our codebase has been audited, and we can't change those versions

I guess my point about generate-lockfile was more to say that "I wouldn't have thought that crane would ever use cargo generate-lockfile, since it isn't reproducible".

The thing that's really confusing me is that the Cargo.lock I have in the repo works when I run cargo check --release --locked --all-targets, but fails with that error when run by crane, which implies to me that crane is either changing the Cargo.lock or the Cargo.toml in a way that cargo considers to be "incompatible".

cameron1024 avatar Mar 19 '24 23:03 cameron1024

It's not that you have uncommited changes to Cargo.lock, it's that cargo wants to make changes to Cargo.lock and you must commit those.

An easy way of doing that without updating all your dependencies is to run cargo check or cargo build, then commit whatever changes are made to Cargo.lock

ipetkov avatar Mar 19 '24 23:03 ipetkov

That's the thing that's puzzling me, cargo check --release --locked --all-targets (and cargo build) succeed outside of nix, and there aren't any changes to the lockfile :/

cameron1024 avatar Mar 20 '24 00:03 cameron1024

Hmm, puzzling indeed... one thing you can try is nix build with --keep-failed and look at the build directory and compare the contents. It might give us a clue if something is changing!

ipetkov avatar Mar 20 '24 01:03 ipetkov

I'll give that a shot and see if I can find a difference. I'm not super familiar with --keep-failed, am I right in thinking that the directory contains the nix build directory? So source/Cargo.lock is the lockfile that is used inside the call to nix build? (Apologies I'm not super familiar with nix terminology).

cameron1024 avatar Mar 21 '24 16:03 cameron1024

Basically Nix will run every build in its own (sandboxed) directory, which gets cleaned up (deleted) when the build is finished (successful or not). Using --keep-failed tells Nix to keep the directory as is (usually somewhere in /tmp) when the build fails, which gives us an opportunity to go back in and interactively look at what state things were left in after the fact!

ipetkov avatar Mar 22 '24 22:03 ipetkov

Hmm, the mysteries continue...

I ran that command, and had a look inside the directory, and the Cargo.lock file was identical to the one in my repo - this is what I expected. But, if I then enter the devshell, navigate to the failed output directory, and run cargo check --release --locked --all-targets, it succeeds, and the lockfile is unmodified. I would have expected this step to fail.

That said, I'm not including anything from the env-vars file, so that could be different.

cameron1024 avatar Mar 24 '24 20:03 cameron1024