nix icon indicating copy to clipboard operation
nix copied to clipboard

Darwin builds forking off processes never finish

Open flokli opened this issue 2 years ago • 6 comments

Describe the bug

I have some Nix build forking off a long-running process in the background. In that specific example, a postgresql database.

The build process starts the database, does some insertions and queries in it, then renders a document into $out (end of build script).

On Linux, after all steps in the build script have been finished, the build exits, and the output can be observed.

On Darwin, the build process gets stuck indefinitely.

A workaround is to manually register cleanup traps in bash.

Steps To Reproduce

Build the following on x86_64-linux vs aarch64-darwin:

let
  pkgs = import <nixpkgs> { };
in
pkgs.callPackage
  (
    with import <nixpkgs> { };
    { stdenv, ... }:

    stdenv.mkDerivation {
      pname = "foo";
      version = "1";
      buildCommand = ''
        sleep infinity &
        
        echo foo > $out
      '';
    }
  )
{ }

Expected behavior

I'd expect in both cases the build to end, and the sleep process to be killed at the end of the build script.

nix-env --version output

nix-env (Nix) 2.13.3

Priorities

Add :+1: to issues you find important.

flokli avatar Apr 17 '23 16:04 flokli

Triaged in the Nix team meeting:

  • @edolstra: it may be a Linux bug instead, since the termination criterion is getting to EOF, not when the process exits
    • will investigate

fricklerhandwerk avatar Apr 28 '23 11:04 fricklerhandwerk

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-04-28-nix-team-meeting-minutes-50/27698/1

nixos-discourse avatar Apr 28 '23 17:04 nixos-discourse

I've seen bulky output get truncated, which would be the opposite problem of this issue.

The process exit and pipe close are independent and important events. I think a good algorithm would be:

// pseudocode; haven't written much std::variant or concurrent code yet; apologies

struct LogCompleted { };

runBuilderProcess() {
  auto builderPid = forkBuilderChild();
  future<ExitCode> exitCode = forkThread(() -> waitpid(builderPid));
  future<LogCompleted> outputDone = forkThread(logShovelingThread);

  std::variant<ExitCode, LogCompleted> firstResult = awaitFirstOf(exitCode, outputDone);

  firstResult.visit {
    exitCode -> {
      try {
        outputDone.awaitWithTimeout(10s)
      }
      catch {
        throw/return/warn/whatever "Console wasn't closed within 10s after builder exited with status %i. Did some child process keep the output open?";
      }
    },
    logCompleted -> {
      try {
        exitCode.awaitWithTimeout(10s)
      }
      catch {
        warn/whatever "builder has not exited within 10s after closing its log output.";
        exitCode.await();
      }
    }
  };
  handle exitCode;
}

roberth avatar May 31 '23 20:05 roberth

Triaged in Nix maintainers team meeting 2023-06-09 without conclusion.

Complete discussion
  • @edolstra: we used to wait for pipe exit
  • @roberth: it's the correct behavior but we should also set a timeout to terminate the parent after a couple of seconds
  • @regnat: should have the same behavior on Linux and Darwin
    • but not being coherent is not the end of the world
  • @regnat: wouldn't be surprised if existing bash with background processes would behave differently for different bash versions
  • @fricklerhandwerk: @roberth's suggestion seems sensible. we may want to have a dedicated Darwin maintainer on or close to the team though, to be able to go into some depth

fricklerhandwerk avatar Jun 16 '23 10:06 fricklerhandwerk

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-06-09-nix-team-meeting-minutes-61/29163/1

nixos-discourse avatar Jun 16 '23 11:06 nixos-discourse

Added "idea approved" for having Nix wait for EOF or child exit, whichever happens first.

edolstra avatar Oct 16 '24 19:10 edolstra

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-10-16-nix-team-meeting-minutes-187/54835/1

nixos-discourse avatar Oct 23 '24 19:10 nixos-discourse