nixos-anywhere
nixos-anywhere copied to clipboard
`exec` is called, but then ssh connection is lost
Describe the bug
I'm trying to use nixos-anywhere to install NixOS onto a machine on my local network. The machine is a brand new System76 Thelio Mega running Ubuntu 22.04. When running nixos-anywhere, everything seems to be going fine, until...
+ echo machine will boot into nixos in 6s...
+ test -e /dev/kmsg
+ exec
Warning: Permanently added '10.144.139.249' (ED25519) to the list of known hosts.
Warning: Permanently added '10.144.139.249' (ED25519) to the list of known hosts.
ssh: connect to host 10.144.139.249 port 22: Connection timed out
ssh: connect to host 10.144.139.249 port 22: Connection timed out
ssh: connect to host 10.144.139.249 port 22: No route to host
ssh: connect to host 10.144.139.249 port 22: No route to host
ssh: connect to host 10.144.139.249 port 22: No route to host
ssh: connect to host 10.144.139.249 port 22: No route to host
ssh: connect to host 10.144.139.249 port 22: No route to host
ssh: connect to host 10.144.139.249 port 22: No route to host
ssh: connect to host 10.144.139.249 port 22: No route to host
^C
...I lost ssh connection to the target, and it is never regained.
Here is the full output.
The target box freezes, and I have to hard reboot. Upon rebooting, nothing seems to have changed.
To Reproduce
Steps to reproduce the behavior:
- Have a machine on your local network with Ubuntu 22.04 installed (not sure if it matters if it's in the cloud or not...)
sudo apt install net-tools(forifconfig)sudo apt-get install openssh-server,sudo systemctl enable ssh --nowsudo nano /etc/ssh/sshd_configand add the linesPort 22andPermitRootLogin yessudo systemctl start sshnix run github:numtide/nixos-anywhere -- --flake github:chessai/thelio-mega#thelio-mega root@<ip of local target>- Wait for it to call
execand then lose connection, then stay disconnected forever.
Expected behavior
Either a successful install, or a clear failure and exit.
System information
Source machine:
❯ nix-shell -p nix-info --run "nix-info -m"
- system: `"x86_64-linux"`
- host os: `Linux 6.1.25, NixOS, 23.05 (Stoat), 23.05.20230421.2362848`
- multi-user?: `yes`
- sandbox: `yes`
- version: `nix-env (Nix) 2.13.3`
- channels(root): `"nixos-22.05pre343321.78cd22c1b86"`
- channels(chessai): `""`
- nixpkgs: `/nix/store/22z4n4mxs2vz3l3lg41dz3mgnq1d4wxs-source`
Target machine:
chessai@system76-pc:~$ uname -a
6.2.6-76060206-generic #202303130630~1679424972~22.04~4a8cde1~dev-Ubuntu SMP PREEMPT_DY x86_64 x86_64 x86_64 GNU/Linux
chessai@system76-pc:~$ cat /boot/config-6.2.6-76060206-generic | grep KEXEC
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
# CONFIG_KEXEC_SIG_FORCE is not set
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
CONFIG_KEXEC_JUMP=y
CONFIG_KEXEC_CORE=y
CONFIG_HAVE_IMA_KEXEC=y
CONFIG_IMA_KEXEC=y
Additional context
I stopped X on the machine and re-ran, this time it hit "ssh: connect to host 10.144.139.249 port 22: Connection timed out" and exited immediately.
Can you run kexec directly on the machine, preferrably in a tty and see what the output is? Run this as root:
curl -L https://github.com/nix-community/nixos-images/releases/download/nixos-unstable/nixos-kexec-installer-noninteractive-x86_64-linux.tar.gz | tar -xzf- -C /root
/root/kexec/run
I assume your machine has enough RAM and the kernel also seems reasonable new.
An alternative is also to just boot nixos from a usb stick and than just use nixos-anywhere against the ssh server:
https://github.com/numtide/nixos-anywhere/blob/main/docs/howtos.md#installing-on-a-machine-with-no-operating-system
@chessai do you have a DHCP server running in your local network? I think I had the same issue because the installer got assigned a different IP address. Restarting nixos-anywhere with the IP assigned to the installer worked, although this is not really ideal.
The problem was that nixos-anywhere pushes its own kexec image which did not have network access, and the machine was not plugged in to the router with ethernet. I think that this makes sense, I just didn't think about it. For a consistent IP I ended up setting up a DHCP server on the source (where I'm invoking nixos-anywhere) machine, and linking the source and target with ethernet.
@chessai So the machine did not have any network connection, is that what you are saying? Because
For a consistent IP I ended up setting up a DHCP server on the source
...you shouldn't need DHCP in theory. nixos-anywhere dumps static IP addresses and routes before kexec to restore them afterwards. https://github.com/nix-community/nixos-images/blob/main/nix/kexec-installer/restore_routes.py
It sounds like the original machine was connected via wifi and lost wifi connection after kexec. Which is expected since we don't support wifi (yet?)
Wifi is definitely out-of-scope for now. There are too many ways how this can be configured.
Not sure if this is the same issue @chessai had, but I ran into similar problem:
nixos-anywhere may fail to connect to the target after kexec if the target network configuration initially used DHCP. The problem is that if the target IP address changes after kexec system switch, the SSH connection to the kexec image would fail. nixos-anywhere restores the original network configuration after kexec switch, but it skips DHCP addresses, which might lead to problems in some cases, such as when installing on local (wired) network where dynamic addresses are used.
One way to fix this issue is to manually configure static address for the target host before running nixos-anywhere.
DHCP addresses are usually re-requested as the installer image has DHCP enabled. There is also always the option to build a custom kexec image that has some standard network configuration builtin (including wifi).
I have same issue on both VPS providers I tested. Basically, all you need is connect via VNC and set default gateway: sudo ip route add 10.0.0.1 dev ens3
But it's kinda annoying.
I experienced the same issue on a cloudcone machine with ubuntu 22.04 and 1G RAM.
Is there a fix available, because of the ip change?
There are mitigations but no silver bullet. There is always the option to boot our nixos installer instead: https://github.com/nix-community/nixos-images?tab=readme-ov-file#iso-installer-images
Same problem here using a greencloud vps
Closed because this became just a collection issue for people that post "Same here" without actually knowing the actual root cause of the original poster is. This is not very useful or actionable. If some hardware doesn't support kexec, we cannot use nixos-anywhere on with kexec. In that case people have to boot NixOS installer somehow so that
nixos-anywhere doesn't have to run kexec. We have an ISO for this: https://github.com/nix-community/nixos-images?tab=readme-ov-file#iso-installer-images
If it is about implement better DHCP support after kexec, this should go into a new issue.
Closed because this became just a collection issue for people that post "Same here" without actually knowing the actual root cause of the original poster is. This is not very useful or actionable. If some hardware doesn't support kexec, we cannot use
nixos-anywhereon with kexec. In that case people have to boot NixOS installer somehow so that nixos-anywhere doesn't have to runkexec. We have an ISO for this: https://github.com/nix-community/nixos-images?tab=readme-ov-file#iso-installer-images If it is about implement better DHCP support after kexec, this should go into a new issue.
I don't understand this reasoning. My hardware did indeed support kexec and yet nixos-anywhere failed. That other people are still mentioning it, nearly 2 years later, means that the issue persists. At the very least the documentation could be more clear.
We already have an issue when IP addresses changes due to DHCP after kexec that has a better description: https://github.com/nix-community/nixos-anywhere/issues/415
At the very least the documentation could be more clear.
Please do a pull request for that.
We already have an issue when IP addresses changes due to DHCP after kexec that has a better description: https://github.com/nix-community/nixos-anywhere/issues/415
Okay, thanks for pointing to this issue. This one makes sense to close then.