nix-darwin icon indicating copy to clipboard operation
nix-darwin copied to clipboard

After building with flakes, git is always failing during darwin-rebuild switch

Open ryanbooker opened this issue 2 years ago • 55 comments
trafficstars

One of my machines has recently started displaying the following error when I try to switch after nix build using flakes. It was previously working fine, and two other machines using the same flake set up are working fine.

./result/sw/bin/darwin-rebuild switch --show-trace --flake .
error:
       … while fetching the input 'git+file:///path/to/the/flake/directory'

       error: program 'git' failed due to signal 9 (Killed: 9)

I ran nix doctor on all machines, and the failing machine's output is slightly different. The working machines all output the following line about being trusted, the failing machine doesn't output this line:

[INFO] You are trusted by store uri: daemon

Is this potentially the issue? How do I get the user to be trusted? All machines have exactly the same /etc/nix/nix.conf.

I'm not sure what I'm looking for, so if anyone has any tips on what to look for or how to get more useful information, that would be greatly appreciated. Thanks.

ryanbooker avatar Jun 18 '23 00:06 ryanbooker

Which git is being found on the machine and what architecture is it and what architecture is the machine?

bestlem avatar Jun 18 '23 07:06 bestlem

The broken machine is arm64, with an arm64 git binary at /etc/profiles/per-user/myuser/bin/git. One of the working machines is arm64 and one is x86_64.

ryanbooker avatar Jun 18 '23 12:06 ryanbooker

For what it's worth this appears to have been the lack of [INFO] You are trusted by store uri: daemon, because it's now working again and the user is trusted… though I have no idea how. Perhaps it fixed itself after a reboot.

ryanbooker avatar Jun 28 '23 11:06 ryanbooker

Sorry, it has started happening again, nix doctor still reports the user as trusted, so I guess it wasn't that. Not sure what I'm looking for now.

ryanbooker avatar Jul 01 '23 03:07 ryanbooker

Unfortunate updately, this issue has now spread to a second machine…

ryanbooker avatar Jul 22 '23 06:07 ryanbooker

This is very strange, but I can't imagine it's not an upstream Nix bug. Perhaps the Nix version changes on activation (Nixpkgs is currently using 2.15 rather than the latest stable Nix release of 2.16)? Do you have the sandbox on (though I don't think that should be relevant for fetching flake inputs)?

emilazy avatar Jul 22 '23 07:07 emilazy

I deleted ~/.cache/nix and the local result link in the flake's folder, then did a nix-collect-garbage and tried again…

Everything worked…

Quite odd. When I'm back home I'll try that on the other computer and see if it fixes things.

ryanbooker avatar Jul 22 '23 10:07 ryanbooker

I find this sometimes - I find a nix flake update seems to fix this sometimes.

I think if you are using a flake and it is not correct in some way then it produces this error.

bestlem avatar Jul 22 '23 10:07 bestlem

This is very strange, but I can't imagine it's not an upstream Nix bug. Perhaps the Nix version changes on activation (Nixpkgs is currently using 2.15 rather than the latest stable Nix release of 2.16)? Do you have the sandbox on (though I don't think that should be relevant for fetching flake inputs)?

Thanks for the response. :)

The nix version it's using is 2.15.1… if the sandbox is disabled by default then I assume it's not on. I haven't actively turned it on.

ryanbooker avatar Jul 22 '23 10:07 ryanbooker

I find this sometimes - I find a nix flake update seems to fix this sometimes.

I think if you are using a flake and it is not correct in some way then it produces this error.

Thanks for the reply @bestlem.

What do you mean by "not correct"… the flake builds successfully. It's only the switch that fails when I call ./result/sw/bin/darwin-rebuild switch --flake .

ryanbooker avatar Jul 22 '23 10:07 ryanbooker

I find this sometimes - I find a nix flake update seems to fix this sometimes. I think if you are using a flake and it is not correct in some way then it produces this error.

Thanks for the reply @bestlem.

What do you mean by "not correct"… the flake builds successfully. It's only the switch that fails when I call ./result/sw/bin/darwin-rebuild switch --flake .

I can"t quite remember but I think it crashed in nix flake check first but by 'not correct' I don"t know what is wrong and fixed it by doing random things rather then seeing where code is wrong. Given other comments here the cache being messed up seems possible.

I do wish you could run nix without it calling git

bestlem avatar Jul 22 '23 10:07 bestlem

Nix really shouldn't be doing this under any circumstances. I would recommend reporting a bug upstream to Nix; presumably the nix-darwin configuration might have something to do with what's going on here, but Git getting randomly killed from underneath Nix isn't something that should be happening regardless of any cache state.

emilazy avatar Jul 22 '23 10:07 emilazy

Unfortunately I found the issue was not reproduceable so can"t form a proper bug report.

Thinking more - the thing which crashes is git so not necessarily a nix bug. But I suspect that this is the nix git failing not a build of git made with Apple"s clang

bestlem avatar Jul 22 '23 11:07 bestlem

I have the same issue on an aarch64 machine and it persists despite following some of the steps mentioned above. I would like to contribute to help fixing this - what useful information can I gather whilst in this broken state?

jonnyowenpowell avatar Jul 23 '23 11:07 jonnyowenpowell

I don't think I've run into this issue locally, so if someone could link to their dotfiles or an example that reproduces the issue, that would be quite helpful :+1:

Enzime avatar Jul 23 '23 22:07 Enzime

AS i said not reproduceable by me, But in the last few days - get an invalid flake and it fails on nix develop or nix build and uses nix"s git. Run on another machine and it shows the nix error. I mainly get them when trying my configuration in a VM with nothing opther than nix installed.

bestlem avatar Jul 24 '23 05:07 bestlem

Just to clarify re my initial (and ongoing) problem… it's not failing in nix build or nix flake check both pass just fine, it's failing somewhere in the darwin-rebuild switch flow.

% ./result/sw/bin/darwin-rebuild switch --flake .
error:
       … while fetching the input 'git+file:///path/to/the/flake/directory'

       error: program 'git' failed due to signal 9 (Killed: 9)

Perhaps that flow does another nix flake check internally or something and that is failing? But if I run a nix flake check it passes.

Any tips on what I could use to verify the tree of nix files that make up my environment? I assumed that if they build, they're fine.

ryanbooker avatar Jul 26 '23 00:07 ryanbooker

Can you run nix run nixpkgs#bash -- -x ./result/sw/bin/darwin-rebuild switch --flake . so we can see what part is failing?

emilazy avatar Jul 26 '23 00:07 emilazy

Neat trick. Thanks for the tip. :)

The output is:

+ set -e
+ set -o pipefail
+ export PATH=/nix/store/gqf7zhkx1n6sijax727ycrm57ikrv3pc-nix-2.15.1/bin:/nix/store/xyagncilqx57cljac32w9ld3kkn276d3-coreutils-9.3/bin:/nix/store/zllqsxfvhzyyzgp5irpbai1c2n7ycb3a-jq-1.6-bin/bin:/nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0/bin:/Users/myuser/.nix-profile/bin:/etc/profiles/per-user/myuser/bin:/run/current-system/sw/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/myuser/.local/bin:/Users/myuser/.nix-profile/bin:/etc/profiles/per-user/myuser/bin:/run/current-system/sw/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin
+ PATH=/nix/store/gqf7zhkx1n6sijax727ycrm57ikrv3pc-nix-2.15.1/bin:/nix/store/xyagncilqx57cljac32w9ld3kkn276d3-coreutils-9.3/bin:/nix/store/zllqsxfvhzyyzgp5irpbai1c2n7ycb3a-jq-1.6-bin/bin:/nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0/bin:/Users/myuser/.nix-profile/bin:/etc/profiles/per-user/myuser/bin:/run/current-system/sw/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/myuser/.local/bin:/Users/myuser/.nix-profile/bin:/etc/profiles/per-user/myuser/bin:/run/current-system/sw/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin
+ origArgs=("$@")
+ extraMetadataFlags=()
+ extraBuildFlags=()
+ extraLockFlags=()
+ extraProfileFlags=()
+ profile=/nix/var/nix/profiles/system
+ action=
+ flake=
+ '[' 3 -gt 0 ']'
+ i=switch
+ shift 1
+ case $i in
+ action=switch
+ '[' 2 -gt 0 ']'
+ i=--flake
+ shift 1
+ case $i in
+ flake=.
+ shift 1
+ '[' 0 -gt 0 ']'
+ '[' -z switch ']'
+ flakeFlags=(--extra-experimental-features 'nix-command flakes')
+ '[' -n . ']'
+ [[ . =~ ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? ]]
+ scheme=
+ authority=
+ path=.
+ queryWithQuestion=
+ fragment=
+ flake=.
+ flakeAttr=
+ '[' -z '' ']'
++ hostname -s
+ flakeAttr=MyMachine
+ flakeAttr=darwinConfigurations.MyMachine
+ '[' -n . ']'
+ nix --extra-experimental-features 'nix-command flakes' flake metadata --version
+ cmd=metadata
++ nix --extra-experimental-features 'nix-command flakes' flake metadata --json -- .
error:
       … while fetching the input 'git+file:///path/to/the/flake/directory'

       error: program 'git' failed due to signal 9 (Killed: 9)
+ metadata=

ryanbooker avatar Jul 27 '23 04:07 ryanbooker

Interesting. So I'm going to assume it's using git from /nix/store/zllqsxfvhzyyzgp5irpbai1c2n7ycb3a-jq-1.6-bin/bin:/nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0/bin. Does nix --extra-experimental-features 'nix-command flakes' flake metadata --json -- . fail on its own? Adding --debug may be useful too.

emilazy avatar Jul 27 '23 04:07 emilazy

It succeeds, outputting some json, and didn't seem to have any errors in the --debug output.

ryanbooker avatar Jul 27 '23 04:07 ryanbooker

Same if you use the same PATH?

PATH=/nix/store/gqf7zhkx1n6sijax727ycrm57ikrv3pc-nix-2.15.1/bin:/nix/store/xyagncilqx57cljac32w9ld3kkn276d3-coreutils-9.3/bin:/nix/store/zllqsxfvhzyyzgp5irpbai1c2n7ycb3a-jq-1.6-bin/bin:/nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0/bin:/Users/myuser/.nix-profile/bin:/etc/profiles/per-user/myuser/bin:/run/current-system/sw/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/myuser/.local/bin:/Users/myuser/.nix-profile/bin:/etc/profiles/per-user/myuser/bin:/run/current-system/sw/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin nix --extra-experimental-features 'nix-command flakes' flake metadata --json -- .

emilazy avatar Jul 27 '23 04:07 emilazy

Good point. 🤦‍♂️

With the same path it fails.

Trying to do anything with the git in that path fails… even git --version is killed.

ryanbooker avatar Jul 27 '23 04:07 ryanbooker

Good that we're narrowing it down! Does --debug print out anything useful? Maybe we can figure out what Git invocation is failing.

emilazy avatar Jul 27 '23 04:07 emilazy

--debug on that metadata command doesn't produce any extra output.

Trying to do anything with the git in that path fails… even git --version is killed

ryanbooker avatar Jul 27 '23 04:07 ryanbooker

Looks like that git works for other people. I assume it's /nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0/bin/git, right? Can you run nix hash path /nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0?

emilazy avatar Jul 27 '23 05:07 emilazy

Yeah, that's the git. The hash is sha256-iMGQgsV95SNCbYKzXW5ErZl/CJm0+sieoqh6BdXcUz8=.

ryanbooker avatar Jul 27 '23 08:07 ryanbooker

I found the issue I think… looking at Crash Reports in the macOS Console app, it looks like the code signature of that git is invalid…

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               git [27329]
Path:                  /Volumes/VOLUME/*/git
Identifier:            git
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        zsh [21835]
Responsible:           stable [1055]
User ID:               501

Date/Time:             2023-07-27 18:48:43.8168 +1000
OS Version:            macOS 14.0 (23A5301g)
Report Version:        12
Anonymous UUID:        46AABDEA-93CF-8AFA-502B-75735F0C3D67

Sleep/Wake UUID:       6003D0E7-6D40-4169-8E40-F55AAEC4B7F8

Time Awake Since Boot: 41000 seconds
Time Since Wake:       195 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
Exception Codes:       UNKNOWN_0x32 at 0x00000001010ebe20
Exception Codes:       0x0000000000000032, 0x00000001010ebe20

Termination Reason:    Namespace CODESIGNING, Code 2 Invalid Page

ryanbooker avatar Jul 27 '23 08:07 ryanbooker

emily@yuyuko ~> nix hash path /nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0
sha256-CWP84RaMWzv1WP+UDwQ/f9j+v8y5fEMve8dXb4DEjoc=

Seems like file system corruption or something causing the binary to be killed due to a bad hash. I would suggest cp -aing the /nix/store/xw0462wahwywbl751zdrn5b56m3af4zz-git-2.41.0 path as a backup and then running nix-store --repair-path on it to get it fixed. If you could upload your corrupted version somewhere, maybe it'd be possible to find out what happened (a bit flip? some kind of corruption during download?). At most this is a Nix bug, but it could just be hardware too.

emilazy avatar Jul 27 '23 09:07 emilazy

Brilliant. That fixed it. Thanks @emilazy!

If I diff the bad and good version of that folder, there are several difference in git related files.

I have a dropbox link… is that sufficient?

ryanbooker avatar Jul 27 '23 10:07 ryanbooker