nix icon indicating copy to clipboard operation
nix copied to clipboard

ca-derivations: warning: output out of input /nix/store/xyz.drv missing, aborting the resolving

Open elaforge opened this issue 2 years ago • 4 comments

Describe the bug

When enabling ca-derivations, if I build a bunch of drvs that use it or are downstream from one that does, I frequently get a crash like:

warning: output out of input /nix/store/096ihak5gdyh2x88r0ywjgh1hmjylfkg-Groq.Compiler.test.end_to_end.resize_linear_align_corners_float16_1x1x20x30_1x1x40x60_slt2x2.test.drv missing, aborting the resolving
error: unexpected end-of-file

This seems to preceded by this in the nix-daemon journal:

src/libstore/build/derivation-goal.cc:463: void nix::DerivationGoal::inputsRealised(): Assertion `attempt' failed.

From the source it looks like tryResolve warns about the failure, but it's actually an error not a warning, because next thing an assert trips over it.

However, it is incorrect about the drv failing, because it is in fact either generated. I can no longer check that non-invasively due to ca-derivations breaking drv->out and (apparently?) not expose its internal lookup mechanism, but nix-store -r instantly gives me the output so it must have worked. So if I just continually retry the build until the error stops happening, then it is able to get all the way through.

Steps To Reproduce

I only started getting this when I started doing remote builds, so it's probably related to either that or a certain amount of parallelism.

nix-env --version output

2.4.1. I'm not sure if newer versions have fixed this, but I couldn't find any references in closed issues. I'll try updating to 2.8, but it's tricky due to nix's lack of cross version support and tendency to introduce new bugs.

elaforge avatar Jun 21 '22 20:06 elaforge

I see (roughly) the same thing with nix 2.9.1.

error: derivation '/nix/store/h3m67kyhkwbdg795nj5ra5j2pqdy5k4a-libpng-1.2.59.drv' doesn't have expected output 'dev' (derivation-goal.cc/resolvedFinished,realisation)

Redoing the build typically fixes it. I made a script to just keep retrying until status code == 0 :stuck_out_tongue_winking_eye:

Mindavi avatar Jun 22 '22 06:06 Mindavi

I would have expected that post 2.7 would have been more robust wrt this kind of breakages (because of https://github.com/NixOS/nix/pull/6221), but apparently it's not :/

Unfortunately I can't really reproduce, but I think both issues might be different since the first one appears at the beginning of the build and the second one at the end.

ca-derivations breaking drv->out and (apparently?) not expose its internal lookup mechanism Maybe nix realisation info --json could help?

thufschmitt avatar Jun 23 '22 05:06 thufschmitt

Yeah there errors I'm seeing are a bit different, not sure if they're the same underlying problem.

It may be hard to reproduce without building a lot of things in parallel across different builders... I don't know if that's what triggers it but that's always where I've seen it. I could try doing a full build locally and without parallelism and see if it comes up but it will take long time.

I already have a framework for retrying on certain kinds of crashes, may be able to plug that new error in.

But sounds like I should try upgrading to 2.8 also? That may take a while but we should do it eventually anyway. The upgrade to 2.4 took about a year but hopefully things are better now.

ca-derivations breaking drv->out and (apparently?) not expose its internal lookup mechanism

Maybe nix realisation info --json could help?

Oh interesting, I didn't know it existed. When run on a drv it does seem to give the outPath, which is exactly what I was looking for! So I guess this is the new nix show-derivation? Or maybe stuff that used to be in show-derivation but now may not be? I wouldn't have guessed from the help because it's about flakes and and building things and installables (not sure what that is) which doesn't make it sound like it's intended to be a "info about drv" tool.

elaforge avatar Jun 23 '22 22:06 elaforge

BTW, I can now pretty easily reproduce this. Even after implementing a retry, it'll fail so much that on every single build it exceeds the 3 try maximum every time. Each time it makes a bit of progress, but somehow I stumbled across something that causes the error very frequently. Is there anything I can do to gather more data?

This is still on nix 2.4 BTW. An alternate path would be to work on upgrading to 2.10 just to eliminate old version as a possibility.

Just to be clear, this is the problem with output out of input /nix/store/blah.drv missing, aborting the resolving. Notably, blah.drv itself is not ca, but it's now descended from one, so maybe this is related to a non-CA derivation with a CA parent?

elaforge avatar Jul 12 '22 19:07 elaforge

I can also confirm this behaviour on 2.10.3; let me know if there is any debugging that would be helpful

urandom2 avatar Aug 20 '22 09:08 urandom2

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-40/23480/1

nixos-discourse avatar Nov 25 '22 09:11 nixos-discourse

Can you try with https://github.com/NixOS/nix/pull/7390 ? Hopefully it will fix that problem in the same way that #7283 fixed #6572

thufschmitt avatar Dec 02 '22 08:12 thufschmitt

Wow, this is good news. Unfortunately my old code to use ca-derivations has surely gotten quite obsolete, so it'll take some time to bring it back to test this. I'll put it on the queue though!

elaforge avatar Dec 06 '22 02:12 elaforge