nix-direnv icon indicating copy to clipboard operation
nix-direnv copied to clipboard

Direnv hangs (and my whole system slows) when flake inputs update

Open the-argus opened this issue 1 year ago • 8 comments

To reproduce:

  • Clone my configuration repo
  • CD into it
  • direnv allow This immediately caused the issue for me. Everything freezes when nix gets to [querying source]. I don't think you will actually be able to reproduce it since I have never seen this issue before and it doesn't happen on my PC, only my laptop. I've tried removing all my .direnv/ directories and .cache/direnv and doing nix-collect-garbage -d, to no avail.

The issue

The process (updating a flake as prompted by direnv use flake) will hang. Everything on my computer becomes slow. Opening windows takes 10-20 seconds. My zsh prompt takes 5 seconds to load. Additionally, after closing the window, the problem remains. I've tried looking for a direnv or zsh or nix process and killing it, but there don't seem to be any...

To make matters worse, it's really inconsistent. I cannot figure out when or why it happens, and sometimes the process does finish if I wait long enough (like 10-20 minutes, I think).

Any other ideas on different things I could try to fix the problem? I thought clearing the caches would at least help.

the-argus avatar Jan 29 '23 23:01 the-argus

Currently won't be able to test this flake out as I am on a Darwin host, which this flake doesn't define an output for. It's difficult to figure out where it would be pausing since there's a fair amount of nesting going on in here that is hard to separate out. However, you might be better off looking for a bash process, as direnv will not invoke your shell to get the environment from, but will always invoke bash.

I will try and see if I can get this flake's devShell building tonight when I can get access to a linux x86-64 host.

bbenne10 avatar Jan 30 '23 13:01 bbenne10

To narrow down next time this happens check sudo systemctl status nix-daemon if you can spot any builds still running. Do you have any DNS issues that might be also a problem in your prompt (i.e. if it tries to access the network)? How is the load of the system when this happens?

Mic92 avatar Feb 02 '23 07:02 Mic92

Sorry for the late response: I got that building, but noticed no problems. Not entirely sure what was happening...

bbenne10 avatar Feb 02 '23 15:02 bbenne10

I got it building too. My response was directed to @the-argus

Mic92 avatar Feb 02 '23 16:02 Mic92

there's a fair amount of nesting going on in here that is hard to separate out.

I just refactored my configuration so it should be a little bit easier to understand for the future.

To narrow down next time this happens check sudo systemctl status nix-daemon if you can spot any builds still running. Do you have any DNS issues that might be also a problem in your prompt (i.e. if it tries to access the network)? How is the load of the system when this happens?

I was definitely connected to the network, but I have had network issues on my PC host which uses a similar configuration (but a wired connection, so idk how related it is). I don't believe it's DNS related, though, as restarting dhcpcd fixes it on that host. In the future if it happens again I will checkout the status of nix-daemon.

Unfortunately I can't do much to replicate the issue any more than anyone else... it's almost random. I'm going to leave this issue open for now, but if it remains inactive after the next few weeks feel free to close it.

the-argus avatar Feb 03 '23 21:02 the-argus

@Mic92 It happened again, this time on my other machine so it seems to more reproducible than I thought. When it happened I checked the status:

● nix-daemon.service - Nix Daemon
     Loaded: loaded (/etc/systemd/system/nix-daemon.service; linked; preset: enabled)
    Drop-In: /nix/store/jl42j0w37sxzfs2iqwkq23614ibq0bq2-system-units/nix-daemon.service.d
             └─overrides.conf
     Active: active (running) since Sat 2023-02-04 20:05:09 EST; 6min ago
TriggeredBy: ● nix-daemon.socket
       Docs: man:nix-daemon
             https://nixos.org/manual
   Main PID: 2454 (nix-daemon)
         IP: 91.7M in, 1.4M out
         IO: 156.2M read, 755.5M written
      Tasks: 2 (limit: 19108)
     Memory: 942.9M
        CPU: 41.373s
     CGroup: /system.slice/nix-daemon.service
             └─2454 nix-daemon --daemon

Feb 04 20:05:10 mutant nix-daemon[2454]: accepted connection from pid 3761, user argus
Feb 04 20:05:42 mutant nix-daemon[3770]: unexpected Nix daemon error: error: interrupted by the user
Feb 04 20:06:02 mutant nix-daemon[2454]: accepted connection from pid 12083, user argus
Feb 04 20:06:02 mutant nix-daemon[2454]: accepted connection from pid 12099, user argus
Feb 04 20:06:32 mutant nix-daemon[2454]: accepted connection from pid 30140, user argus
Feb 04 20:08:01 mutant nix-daemon[2454]: accepted connection from pid 57368, user argus
Feb 04 20:09:03 mutant nix-daemon[2454]: accepted connection from pid 57591, user argus
Feb 04 20:09:03 mutant nix-daemon[2454]: accepted connection from pid 57964, user argus
Feb 04 20:09:31 mutant nix-daemon[2454]: accepted connection from pid 58374, user argus
Feb 04 20:09:51 mutant nix-daemon[2454]: accepted connection from pid 60591, user argus

This only tells me that my keyboard interrupt did not work (as it did at 20:05:42). Is Tasks: 2 meant to say that it has a running subprocess? If so, how can I further investigate what these processes are doing and find their logs?

the-argus avatar Feb 05 '23 01:02 the-argus

Happened again immediately after This time there are some new logs and also 4 tasks.

● nix-daemon.service - Nix Daemon
     Loaded: loaded (/etc/systemd/system/nix-daemon.service; linked; preset: enabled)
    Drop-In: /nix/store/jl42j0w37sxzfs2iqwkq23614ibq0bq2-system-units/nix-daemon.service.d
             └─overrides.conf
     Active: active (running) since Sat 2023-02-04 20:05:09 EST; 30min ago
TriggeredBy: ● nix-daemon.socket
       Docs: man:nix-daemon
             https://nixos.org/manual
   Main PID: 2454 (nix-daemon)
         IP: 112.5M in, 1.7M out
         IO: 176.6M read, 1.4G written
      Tasks: 4 (limit: 19108)
     Memory: 1.6G
        CPU: 45.820s
     CGroup: /system.slice/nix-daemon.service
             ├─ 2454 nix-daemon --daemon
             └─66421 nix-daemon 66411

Feb 04 20:09:03 mutant nix-daemon[2454]: accepted connection from pid 57964, user argus
Feb 04 20:09:31 mutant nix-daemon[2454]: accepted connection from pid 58374, user argus
Feb 04 20:09:51 mutant nix-daemon[2454]: accepted connection from pid 60591, user argus
Feb 04 20:33:40 mutant nix-daemon[2454]: accepted connection from pid 63363, user argus
Feb 04 20:33:59 mutant nix-daemon[2454]: accepted connection from pid 63666, user argus
Feb 04 20:34:41 mutant nix-daemon[2454]: accepted connection from pid 64886, user argus
Feb 04 20:34:51 mutant nix-daemon[2454]: accepted connection from pid 65841, user argus
Feb 04 20:34:54 mutant nix-daemon[65852]: unexpected Nix daemon error: error: writing to file: Broken pipe
Feb 04 20:35:39 mutant nix-daemon[2454]: accepted connection from pid 66037, user argus
Feb 04 20:35:39 mutant nix-daemon[2454]: accepted connection from pid 66411, user argus```

the-argus avatar Feb 05 '23 01:02 the-argus

I think that a journalctl -u nix-daemon and somehow getting to the timestamps in question may help a bit more than just the job status. I'm unsure what's going on, but that Broken Pipe doesn't look good to me.

bbenne10 avatar Feb 05 '23 14:02 bbenne10