nixus
nixus copied to clipboard
Make the deploy script kill child processes upon exit
This allows aborting the deploy with Ctrl-C.
Fixes #5.
Last time I attempted this, @Infinisil pointed out that this wouldn't kill remote deploy processes. So, you prevent the local deploy from finishing, but not the remote ones.
https://logs.nix.samueldr.com/nixus/2020-08-26#1598402208-1598402243
Ok, hehe, I wondered why nobody had done it if it was this easy.
I guess this is better than nothing though. Most of the time when I need to Ctrl-C, it's during copying-closure time, which this would handle just fine. It's only a problem once it gets to https://github.com/Infinisil/nixus/blob/2127516f68ce6f900b38289985377559e69aab88/modules/deploy.nix#L106, which is the line that starts the remote process
I tested a bit more and found that ssh -tt
actually fixes terminating remote processes. (I did this in a separate test script, not with nixus.) Does anyone know of a reason not to make nixus use ssh -tt
?
Here is the test script:
#!/usr/bin/env bash
set -x
#trap "echo Received signal, exiting; exit" INT TERM
trap "exit" INT TERM
trap "kill 0" EXIT
# Remove the ssh -tt flags and see that when this script gets aborted, the "sleep
# 12m" command gets reparented to PID 1 on srv1 and lives on. With -tt it gets
# properly terminated.
ssh -tt srv1 sleep 12m &
sleep 12m &
wait
AFAICS, there is only one place where we'd need ssh -tt
, thats where we switch to the new system:
diff --git a/modules/deploy.nix b/modules/deploy.nix
index 42f7f0f..5f1a9af 100644
--- a/modules/deploy.nix
+++ b/modules/deploy.nix
@@ -103,7 +103,9 @@ let
privilegeEscalation = builtins.concatStringsSep " " config.privilegeEscalationCommand;
in lib.dag.entryAnywhere ''
echo "Triggering system switcher..." >&2
- id=$(ssh -o BatchMode=yes "$HOST" exec "${switch}/bin/switch" start "${system}")
+ # Use ssh -tt flags to also interrupt/stop the command on the remote
+ # side when the user presses Ctrl-C on the local side.
+ id=$(ssh -tt -o BatchMode=yes "$HOST" exec "${switch}/bin/switch" start "${system}")
echo "Trying to confirm success..." >&2
active=1
The other ssh calls are short-lived (like running mkdir or readlink).
I can't find any docs for -tt
, there's only -t
? Also, Nixus' switch start
command itself forks another process (a switch run
) into the background, so I don't expect that to work anyways. See https://github.com/Infinisil/nixus/blob/2127516f68ce6f900b38289985377559e69aab88/scripts/switch#L61
I can't find any docs for -tt, there's only -t?
Right, in the docs for -t
it says Multiple -t options force tty allocation, even if ssh has no local tty.
.
Nixus' switch start command itself forks another process (a switch run) into the background, so I don't expect that to work anyways.
Ok. I'm not familiar with that code. I guess at some point you don't want the script to be aborted? Some kind of "atomic section"?