Deploy-rs hangs up and doesn't do anything
Hey,
I was using this example repository: https://github.com/LGUG2Z/nixos-hetzner-robot-starter to deploy nixos to bare metal AMD server.
I was able to get it up and running and when I'm trying to do small changes and deploy it with deploy-rs it just hangs up and doesn't output anything for 15 minutes until I stopped the process:
$ nix run github:serokell/deploy-rs -- --remote-build --debug-logs -s .#robot
๐ โ [deploy] [DEBUG] Checking for flake support
๐ โน๏ธ [deploy] [INFO] Evaluating flake in .
warning: Git tree '~/Projects/nixos-hetzner-robot-starter' is dirty
๐ โน๏ธ [deploy] [INFO] The following profiles are going to be deployed:
[robot.system]
user = "root"
ssh_user = "root"
path = "/nix/store/jqp8p6xwcfdrhfjkfdxg9i97fnmf2d0r-activatable-nixos-system-unnamed-24.11.20240915.1f3227d"
hostname = "X.Y.Z.W"
ssh_opts = []
๐ โ [deploy] [DEBUG] Finding the deriver of store path for /nix/store/jqp8p6xwcfdrhfjkfdxg9i97fnmf2d0r-activatable-nixos-system-unnamed-24.11.20240915.1f3227d
๐ โน๏ธ [deploy] [INFO] Building profile `system` for node `robot` on remote host
error: interrupted by the user
How could I debug whats going on?
suggest you add -- -L at the end to pass to nix
ยท --print-build-logs / -L Print full build logs on standard error.
Did I ran it properly like this?
$ nix run github:serokell/deploy-rs -- --remote-build --debug-logs -s .#robot -- -L
๐ โ [deploy] [DEBUG] Checking for flake support
๐ โน๏ธ [deploy] [INFO] Evaluating flake in .
warning: Git tree '~/Projects/hetzner-servers/nixos-hetzner-robot-starter' is dirty
๐ โน๏ธ [deploy] [INFO] The following profiles are going to be deployed:
[robot.system]
user = "root"
ssh_user = "root"
path = "/nix/store/0ad6vfkvf7lkzsvwl0wq9slj6w82c27w-activatable-nixos-system-unnamed-24.11.20240915.1f3227d"
hostname = "X.Y.Z.W"
ssh_opts = []
๐ โ [deploy] [DEBUG] Finding the deriver of store path for /nix/store/0ad6vfkvf7lkzsvwl0wq9slj6w82c27w-activatable-nixos-system-unnamed-24.11.20240915.1f3227d
๐ โน๏ธ [deploy] [INFO] Building profile `system` for node `robot` on remote host
It still hangs in the exact same step without any logs ๐
Do I remember correctly that you are using macos to deploy from?
Maybe this?: https://github.com/serokell/deploy-rs/issues/216#issuecomment-1984387344
try --skip-checks
Hey,
I got the same issue when deploying from a WSL2 instance.
[nix-shell:/mnt/c/Users/admin/Documents/glaieul/deployrs]$ deploy .#astryce
๐ โน๏ธ [deploy] [INFO] Running checks for flake in .
warning: unknown flake output 'deploy'
๐ โน๏ธ [deploy] [INFO] Evaluating flake in .
๐ โน๏ธ [deploy] [INFO] The following profiles are going to be deployed:
[astryce.system]
user = "root"
ssh_user = "deploy-rs"
path = "/nix/store/ba5cxpgqc5k1wllj5cs9rdwd0bkyfqqp-activatable-nixos-system-unnamed-24.11.20241107.85f7e66"
hostname = "192.168.1.3"
ssh_opts = ["-i", "~/.ssh/id_ed25519_glaieul_deployrs"]
๐ โน๏ธ [deploy] [INFO] Building profile `system` for node `astryce` on remote host
No way to make it predictable whatsoever; it just sometimes, out of nowhere, happens.
For instance, I can deploy a bunch of machines using deploy .#mymachine just fine, but some deploy .#someothermachine will just fail.
Sometimes by just retrying a bunch, it ends up working.
Sometimes if I modified the configuration deploy, it ended up working just after the modification, but not always.
It's really weird.
Let me know if there is something I can do to help debug this (I already tried the usual -D and -- -L without success).
Thanks
After a bit more debugging its looks like the faulty process is nix copy the exact command is similar to:
nix copy -s --to ssh-ng://[email protected] --derivation /nix/store/l2wfdn7jy0kbgxa48r2n9gm2lvxaqwvw-activatable-nixos-system-unnamed-24.11.20241107.85f7e66.drv -vvv
It does not seems to returns anything.
I did test with the ssh option: -oControlMaster=no and look for some stale lock file into /nix/store without success