microvm.nix icon indicating copy to clipboard operation
microvm.nix copied to clipboard

microvm@%i.service restart loop with cloud-hypervisor

Open vdbe opened this issue 1 year ago • 4 comments

When using the cloud-hypervisor the microvm@%i.service keeps restarting. The vm boots and works normally when started via microvm -r or current/bin/microvm-run (needs restarting microvm-virtiofsd when using virtiofsd).

Tested vms with qemu as hypervisor and no other changes and that works as expected.

The microvm@%i.service seems to be stuck on activating (start): Active: activating (start) since Sat 2024-08-10 15:48:56 UTC; 48s ago but don't directly see something wrong with the sockets.

Made a flake to isolate the issue but can't find the problem, tested on hardware and with nixos-rebuild build-vm ... both had the same results. https://github.com/vdbe/microvm-example

  • microvm release 0.4.1 works perfectly (commit 9d3cc92a8e2f0a36c767042b484cc8b8c6f371d3 also still works, just before notify.vsock)
  • just "default" with useNotifySockets = true also doesn't work (microvm.socket was the default socket ""cloud-hypervisor-default.sock")

vdbe avatar Aug 10 '24 17:08 vdbe

Are there any error msgs in journalctl -eu microvm@\*? Try boot.kernelParams = [ "verbose" ];

Are you able to git bisect the breaking change in microvm.nix?

astro avatar Aug 10 '24 22:08 astro

Did not use git bisect but commit a439229a1af9e0fae3b3b21619c1983901a41bf7 is first commit to break (9d3cc92a8e2f0a36c767042b484cc8b8c6f371d3 works). Did not see any relevant error messages in journalctl -eu microvm@\*.

Output from journalctl -b 0 -eu microvm@cloud-hypervisor-default boot.kernelParams = [ "verbose" ]; for host & guest .force.log is with kernelParams = mkForce [ "verbose" ]; for host and kernelParams = mkForce [ "root=fstab" "verbose" ]; on guest to get rid of "loglevel=4":

vdbe avatar Aug 11 '24 07:08 vdbe

I'm having the same issue (cloud-hypervisor looping mysteriously, fixed by changing to QEMU). My VMs were also unable to boot with crosvm (unsure if related, just what I tried before QEMU).

Dan-Theriault avatar Aug 11 '24 23:08 Dan-Theriault

crossvm doesn't have this issue for me just cloud-hypervisor (couldn't build/test alioth) image

vdbe avatar Aug 12 '24 08:08 vdbe

After seeing #268 and d52082cc2668b8cd788e3133526c8693ee71f6a5 and tested again with nixos-24.05 which has systemd version 255.9 and it worked perfectly.

d52082cc2668b8cd788e3133526c8693ee71f6a5 however does not fix the issue (which I think was the goal) because the systemd service still has Type=notify. image

from https://github.com/astro/microvm.nix/blob/d52082cc2668b8cd788e3133526c8693ee71f6a5/nixos-modules/host/default.nix#L108-L111

I guess https://github.com/astro/microvm.nix/blob/d52082cc2668b8cd788e3133526c8693ee71f6a5/lib/runners/cloud-hypervisor.nix#L122 needs to be supportsNotifySocket = doNotify.

vdbe avatar Sep 04 '24 08:09 vdbe

Sorry for that. You're right. 0fb06e0629 fixes it.

astro avatar Sep 06 '24 22:09 astro