nix icon indicating copy to clipboard operation
nix copied to clipboard

remote build of 'silent' packages fails

Open ikervagyok opened this issue 5 years ago • 6 comments

Describe the bug Building of some packages on remote hosts fails. The affected packages produce no terminal output for long periods of time and thus the SSH connection gets closed for inactivity.

To Reproduce Steps to reproduce the behavior:

  1. setup remote building: https://nixos.wiki/wiki/Distributed_build
  2. build qtwebengine with -j0, to force remote build
  3. if your server doesn't produce warnings fast enough, you'll get this error on the server:
    Jan 05 13:33:52 SERVER systemd-logind[785]: Session 17 logged out. Waiting for processes to exit.
    Jan 05 13:33:52 SERVER systemd-logind[785]: Removed session 17.
    Jan 05 13:33:55 SERVER nix-daemon[2645]: unexpected Nix daemon error: writing to file: Broken pipe
    
    And on the client it will fail after its own timeout period.
    ...
    ../../3rdparty/chromium/services/network/trust_tokens/trust_token_request_redemption_helper.cc:59:31: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
       59 |   DCHECK(request->initiator() &&
          |          ~~~~~~~~~~~~~~~~~~~~~^~
       60 |              request->initiator()->scheme() == url::kHttpsScheme ||
          |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ../../3rdparty/chromium/base/logging.h:808:54: note: in definition of macro 'DCHECK'
      808 | #define DCHECK(condition) EAT_STREAM_PARAMETERS << !(condition)
          |                                                      ^~~~~~~~~
    
    client_loop: send disconnect: Broken pipe
    error: unexpected end-of-file
    builder for '/nix/store/9qskm7w05npz9vsh4r65dsjk11yvwi8m-qtwebengine-5.15.2.drv' failed with exit code 1
    cannot build derivation '/nix/store/xkr3cs62lf4lbi9bdswl7nsvbjcfwcv6-zoom-us-5.4.53350.1027.drv': 1 dependencies couldn't be built
    ...
    

Expected behavior No manual workarounds on SSH configs for remote building. nixos-rebuild -j 0 should always work, as long as there is a network connection. Maybe nix could send some sort of heartbeat packets over the same connection?

# nix-env --version
nix-env (Nix) 2.3.10
# nixos-version
21.03.git.014440d7105 (Okapi)

ikervagyok avatar Jan 05 '21 17:01 ikervagyok

SSH can already do this, see the ServerAliveInterval and TCPKeepAlive options in ssh_config.

edolstra avatar Jan 06 '21 10:01 edolstra

I know remote builds are kinda high-level, but it still is bad UX. I love the deterministic approach nix's ecosystem takes, and this doesn't feel right, since the only exhausted resource is a ssh/tcp heartbeat.

If you think everybody should solve this on his own, feel free to close this ticket.

p.s.: since it's my first interaction with @edolstra: Thanks for (starting) nix and the ecosystem around it!

ikervagyok avatar Jan 06 '21 21:01 ikervagyok

I marked this as stale due to inactivity. → More info

stale[bot] avatar Jul 08 '21 00:07 stale[bot]

Stil relevant to me.

magnetophon avatar Sep 13 '23 09:09 magnetophon

I think we could just pass something like -o ServerAliveInterval=25 to the client process in ssh.c. That way will override user configuration, but it's a fairly low value, so I don't think that will be a problem. I don't think it needs to be higher because I agree with @ikervagyok that this is cheap. Especially compared to, like, building, or even the I/O and IPC we normally have for actual log lines that tend to be far more frequent than that.

cc @rickynils?

roberth avatar Aug 16 '24 15:08 roberth

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-28-nix-team-meeting-minutes-173/51302/1

nixos-discourse avatar Aug 28 '24 21:08 nixos-discourse