microvm.nix icon indicating copy to clipboard operation
microvm.nix copied to clipboard

Forced `--serial tty` in cloud-hypervisor can induce up to a minute of userspace delay during boot

Open RaitoBezarius opened this issue 5 months ago • 1 comments

The serial device in cloud-hypervisor is not performant and is mostly made for debugging I think, it makes journald slower, which in turn, makes udev slower, which in turn, makes all device mounts takes a shit ton of time.

Here's attached two examples (one with virtio-console and one with the serial device):

A microvm with --serial tty

A microvm with virtio-console

I guess, the option should be offered to choose between the two of them.

RaitoBezarius avatar Jun 16 '25 15:06 RaitoBezarius

I had noticed this slowdown a couple months back but couldn't find the cause, eventually switching back to qemu. Cool that you found the cause, thanks!

c0deaddict avatar Jun 16 '25 20:06 c0deaddict

I tried the following diff on a real world microvm deployment and it only made a difference of ~6-~8s seconds.

diff --git a/lib/runners/cloud-hypervisor.nix b/lib/runners/cloud-hypervisor.nix
index 873eb85..fe43cb4 100644
--- a/lib/runners/cloud-hypervisor.nix
+++ b/lib/runners/cloud-hypervisor.nix
@@ -14,7 +14,9 @@
   }.${pkgs.stdenv.system};
 
   kernelConsole =
-    if pkgs.stdenv.system == "x86_64-linux"
+    if !microvmConfig.serialConsole
+    then "console=hvc0"
+    else if pkgs.stdenv.system == "x86_64-linux"
     then "earlyprintk=ttyS0 console=ttyS0"
     else if pkgs.stdenv.system == "aarch64-linux"
     then "console=ttyAMA0"
@@ -139,11 +141,11 @@
         )
         "--cpus" "boot=${toString vcpu}"
         "--watchdog"
-        "--console" "null"
-        "--serial" "tty"
+        "--console" "tty"
+        "--serial" "null"
         "--kernel" kernelPath
         "--initramfs" initrdPath
-        "--cmdline" "${kernelConsole} reboot=t panic=-1 ${builtins.unsafeDiscardStringContext (toString microvmConfig.kernelParams)}"
+        "--cmdline" /*"${kernelConsole}*/ "${builtins.unsafeDiscardStringContext (toString microvmConfig.kernelParams)}"
         "--seccomp" "true"
         "--memory" memOps
       ]
diff --git a/nixos-modules/microvm/optimization.nix b/nixos-modules/microvm/optimization.nix
index abb605d..4626543 100644
--- a/nixos-modules/microvm/optimization.nix
+++ b/nixos-modules/microvm/optimization.nix
@@ -30,10 +30,10 @@
         ]);
       tpm2.enable = lib.mkDefault false;
     };
-    kernelParams = [
-      # we only need one serial console
-      "8250.nr_uarts=1"
-    ];
+    # kernelParams = [
+    #   # we only need one serial console
+    #   "8250.nr_uarts=1"
+    # ];
     swraid.enable = lib.mkDefault false;
   };

Did you use the cloud-hypervisor example from this repo above? For me it boots in about 6 to 8 seconds before this change and maybe 1s faster after this change. I suspect that this is caused by actual services being started and every systemd target added slows down the boot process for unknown reasons. Normally some targets in the bootchain are hit one after the other without any real delay but with cloud-hypervisor they have a delay of about 3 seconds. Just switching to qemu boosted our boot times by almost 2x.

SuperSandro2000 avatar Jul 04 '25 16:07 SuperSandro2000

On my machine, a minimal MicroVM using cloud-hypervisor is also very slow to both start up and shut down. A complete cycle with systemctl restart microvm@example takes almost two minutes. After switching to qemu, the machine boots within a few seconds.

I tried setting console=tty and serial=null and serial=off, but this made no difference. So far, I haven't modified the runner configuration for cloud-hypervisor directly.

cryptoluks avatar Oct 13 '25 11:10 cryptoluks

The cloud-hypervisor example starts up fast but as soon as you add some minimal load, it gets disproportionately slower than expected. This could also be an upstream bug.

SuperSandro2000 avatar Oct 13 '25 11:10 SuperSandro2000