microvm.nix
microvm.nix copied to clipboard
Forced `--serial tty` in cloud-hypervisor can induce up to a minute of userspace delay during boot
The serial device in cloud-hypervisor is not performant and is mostly made for debugging I think, it makes journald slower, which in turn, makes udev slower, which in turn, makes all device mounts takes a shit ton of time.
Here's attached two examples (one with virtio-console and one with the serial device):
I guess, the option should be offered to choose between the two of them.
I had noticed this slowdown a couple months back but couldn't find the cause, eventually switching back to qemu. Cool that you found the cause, thanks!
I tried the following diff on a real world microvm deployment and it only made a difference of ~6-~8s seconds.
diff --git a/lib/runners/cloud-hypervisor.nix b/lib/runners/cloud-hypervisor.nix
index 873eb85..fe43cb4 100644
--- a/lib/runners/cloud-hypervisor.nix
+++ b/lib/runners/cloud-hypervisor.nix
@@ -14,7 +14,9 @@
}.${pkgs.stdenv.system};
kernelConsole =
- if pkgs.stdenv.system == "x86_64-linux"
+ if !microvmConfig.serialConsole
+ then "console=hvc0"
+ else if pkgs.stdenv.system == "x86_64-linux"
then "earlyprintk=ttyS0 console=ttyS0"
else if pkgs.stdenv.system == "aarch64-linux"
then "console=ttyAMA0"
@@ -139,11 +141,11 @@
)
"--cpus" "boot=${toString vcpu}"
"--watchdog"
- "--console" "null"
- "--serial" "tty"
+ "--console" "tty"
+ "--serial" "null"
"--kernel" kernelPath
"--initramfs" initrdPath
- "--cmdline" "${kernelConsole} reboot=t panic=-1 ${builtins.unsafeDiscardStringContext (toString microvmConfig.kernelParams)}"
+ "--cmdline" /*"${kernelConsole}*/ "${builtins.unsafeDiscardStringContext (toString microvmConfig.kernelParams)}"
"--seccomp" "true"
"--memory" memOps
]
diff --git a/nixos-modules/microvm/optimization.nix b/nixos-modules/microvm/optimization.nix
index abb605d..4626543 100644
--- a/nixos-modules/microvm/optimization.nix
+++ b/nixos-modules/microvm/optimization.nix
@@ -30,10 +30,10 @@
]);
tpm2.enable = lib.mkDefault false;
};
- kernelParams = [
- # we only need one serial console
- "8250.nr_uarts=1"
- ];
+ # kernelParams = [
+ # # we only need one serial console
+ # "8250.nr_uarts=1"
+ # ];
swraid.enable = lib.mkDefault false;
};
Did you use the cloud-hypervisor example from this repo above? For me it boots in about 6 to 8 seconds before this change and maybe 1s faster after this change. I suspect that this is caused by actual services being started and every systemd target added slows down the boot process for unknown reasons. Normally some targets in the bootchain are hit one after the other without any real delay but with cloud-hypervisor they have a delay of about 3 seconds. Just switching to qemu boosted our boot times by almost 2x.
On my machine, a minimal MicroVM using cloud-hypervisor is also very slow to both start up and shut down. A complete cycle with systemctl restart microvm@example takes almost two minutes. After switching to qemu, the machine boots within a few seconds.
I tried setting console=tty and serial=null and serial=off, but this made no difference. So far, I haven't modified the runner configuration for cloud-hypervisor directly.
The cloud-hypervisor example starts up fast but as soon as you add some minimal load, it gets disproportionately slower than expected. This could also be an upstream bug.