deploy-rs icon indicating copy to clipboard operation
deploy-rs copied to clipboard

Deploy-rs hangs up and doesn't do anything

Open onnimonni opened this issue 1 year ago โ€ข 5 comments

Hey,

I was using this example repository: https://github.com/LGUG2Z/nixos-hetzner-robot-starter to deploy nixos to bare metal AMD server.

I was able to get it up and running and when I'm trying to do small changes and deploy it with deploy-rs it just hangs up and doesn't output anything for 15 minutes until I stopped the process:

$ nix run github:serokell/deploy-rs -- --remote-build --debug-logs -s .#robot
๐Ÿš€ โ“ [deploy] [DEBUG] Checking for flake support
๐Ÿš€ โ„น๏ธ [deploy] [INFO] Evaluating flake in .
warning: Git tree '~/Projects/nixos-hetzner-robot-starter' is dirty
๐Ÿš€ โ„น๏ธ [deploy] [INFO] The following profiles are going to be deployed:
[robot.system]
user = "root"
ssh_user = "root"
path = "/nix/store/jqp8p6xwcfdrhfjkfdxg9i97fnmf2d0r-activatable-nixos-system-unnamed-24.11.20240915.1f3227d"
hostname = "X.Y.Z.W"
ssh_opts = []

๐Ÿš€ โ“ [deploy] [DEBUG] Finding the deriver of store path for /nix/store/jqp8p6xwcfdrhfjkfdxg9i97fnmf2d0r-activatable-nixos-system-unnamed-24.11.20240915.1f3227d
๐Ÿš€ โ„น๏ธ [deploy] [INFO] Building profile `system` for node `robot` on remote host
error: interrupted by the user

How could I debug whats going on?

onnimonni avatar Sep 17 '24 07:09 onnimonni

suggest you add -- -L at the end to pass to nix

  ยท --print-build-logs / -L Print full build logs on standard error.

sedlund avatar Sep 17 '24 08:09 sedlund

Did I ran it properly like this?

$ nix run github:serokell/deploy-rs -- --remote-build --debug-logs -s .#robot -- -L
๐Ÿš€ โ“ [deploy] [DEBUG] Checking for flake support
๐Ÿš€ โ„น๏ธ [deploy] [INFO] Evaluating flake in .
warning: Git tree '~/Projects/hetzner-servers/nixos-hetzner-robot-starter' is dirty
๐Ÿš€ โ„น๏ธ [deploy] [INFO] The following profiles are going to be deployed:
[robot.system]
user = "root"
ssh_user = "root"
path = "/nix/store/0ad6vfkvf7lkzsvwl0wq9slj6w82c27w-activatable-nixos-system-unnamed-24.11.20240915.1f3227d"
hostname = "X.Y.Z.W"
ssh_opts = []

๐Ÿš€ โ“ [deploy] [DEBUG] Finding the deriver of store path for /nix/store/0ad6vfkvf7lkzsvwl0wq9slj6w82c27w-activatable-nixos-system-unnamed-24.11.20240915.1f3227d
๐Ÿš€ โ„น๏ธ [deploy] [INFO] Building profile `system` for node `robot` on remote host

It still hangs in the exact same step without any logs ๐Ÿ˜ž

onnimonni avatar Sep 17 '24 20:09 onnimonni

Do I remember correctly that you are using macos to deploy from?

Maybe this?: https://github.com/serokell/deploy-rs/issues/216#issuecomment-1984387344

try --skip-checks

sedlund avatar Sep 18 '24 15:09 sedlund

Hey,

I got the same issue when deploying from a WSL2 instance.

[nix-shell:/mnt/c/Users/admin/Documents/glaieul/deployrs]$ deploy .#astryce
๐Ÿš€ โ„น๏ธ [deploy] [INFO] Running checks for flake in .
warning: unknown flake output 'deploy'
๐Ÿš€ โ„น๏ธ [deploy] [INFO] Evaluating flake in .
๐Ÿš€ โ„น๏ธ [deploy] [INFO] The following profiles are going to be deployed:
[astryce.system]
user = "root"
ssh_user = "deploy-rs"
path = "/nix/store/ba5cxpgqc5k1wllj5cs9rdwd0bkyfqqp-activatable-nixos-system-unnamed-24.11.20241107.85f7e66"
hostname = "192.168.1.3"
ssh_opts = ["-i", "~/.ssh/id_ed25519_glaieul_deployrs"]

๐Ÿš€ โ„น๏ธ [deploy] [INFO] Building profile `system` for node `astryce` on remote host

No way to make it predictable whatsoever; it just sometimes, out of nowhere, happens.

For instance, I can deploy a bunch of machines using deploy .#mymachine just fine, but some deploy .#someothermachine will just fail.

Sometimes by just retrying a bunch, it ends up working.

Sometimes if I modified the configuration deploy, it ended up working just after the modification, but not always.

It's really weird.

Let me know if there is something I can do to help debug this (I already tried the usual -D and -- -L without success).

Thanks

asyncnomi avatar Dec 22 '24 23:12 asyncnomi

After a bit more debugging its looks like the faulty process is nix copy the exact command is similar to:

nix copy -s --to ssh-ng://[email protected] --derivation /nix/store/l2wfdn7jy0kbgxa48r2n9gm2lvxaqwvw-activatable-nixos-system-unnamed-24.11.20241107.85f7e66.drv -vvv

It does not seems to returns anything.

I did test with the ssh option: -oControlMaster=no and look for some stale lock file into /nix/store without success

asyncnomi avatar Dec 23 '24 01:12 asyncnomi