microvm@%i.service restart loop with cloud-hypervisor
When using the cloud-hypervisor the microvm@%i.service keeps restarting.
The vm boots and works normally when started via microvm -r or current/bin/microvm-run (needs restarting microvm-virtiofsd when using virtiofsd).
Tested vms with qemu as hypervisor and no other changes and that works as expected.
The microvm@%i.service seems to be stuck on activating (start): Active: activating (start) since Sat 2024-08-10 15:48:56 UTC; 48s ago but don't directly see something wrong with the sockets.
Made a flake to isolate the issue but can't find the problem, tested on hardware and with nixos-rebuild build-vm ... both had the same results.
https://github.com/vdbe/microvm-example
- microvm release 0.4.1 works perfectly (commit 9d3cc92a8e2f0a36c767042b484cc8b8c6f371d3 also still works, just before notify.vsock)
- just "default" with
useNotifySockets = truealso doesn't work (microvm.socket was the default socket ""cloud-hypervisor-default.sock")
Are there any error msgs in journalctl -eu microvm@\*? Try boot.kernelParams = [ "verbose" ];
Are you able to git bisect the breaking change in microvm.nix?
Did not use git bisect but commit a439229a1af9e0fae3b3b21619c1983901a41bf7 is first commit to break (9d3cc92a8e2f0a36c767042b484cc8b8c6f371d3 works).
Did not see any relevant error messages in journalctl -eu microvm@\*.
Output from journalctl -b 0 -eu microvm@cloud-hypervisor-default boot.kernelParams = [ "verbose" ]; for host & guest .force.log is with kernelParams = mkForce [ "verbose" ]; for host and kernelParams = mkForce [ "root=fstab" "verbose" ]; on guest to get rid of "loglevel=4":
-
current head: 15bca94a8d503500169bcc508a1011f68cd91d6c
-
notify_socket: a439229a1af9e0fae3b3b21619c1983901a41bf7
-
pre-notify_socket: 9d3cc92a8e2f0a36c767042b484cc8b8c6f371d3
I'm having the same issue (cloud-hypervisor looping mysteriously, fixed by changing to QEMU). My VMs were also unable to boot with crosvm (unsure if related, just what I tried before QEMU).
crossvm doesn't have this issue for me just cloud-hypervisor (couldn't build/test alioth)
After seeing #268 and d52082cc2668b8cd788e3133526c8693ee71f6a5 and tested again with nixos-24.05 which has systemd version 255.9 and it worked perfectly.
d52082cc2668b8cd788e3133526c8693ee71f6a5 however does not fix the issue (which I think was the goal) because the systemd service still has Type=notify.
from https://github.com/astro/microvm.nix/blob/d52082cc2668b8cd788e3133526c8693ee71f6a5/nixos-modules/host/default.nix#L108-L111
I guess
https://github.com/astro/microvm.nix/blob/d52082cc2668b8cd788e3133526c8693ee71f6a5/lib/runners/cloud-hypervisor.nix#L122
needs to be supportsNotifySocket = doNotify.
Sorry for that. You're right. 0fb06e0629 fixes it.