Flatcar
Flatcar copied to clipboard
emergency mode with too many hard disks / slow SAS controller
Description
I have zpools spanning 24 disks in that node. As soon as I connect these disks, boot always ends up in emergency mode because some device mapper discovery (?) takes a long time. I then see sysroot.mount: Mounting timeout. Terminating.
and from there all things stop into Emergency Mode.
Before that I see many lines like this (presumably one per disk or so):
systemd-udevd[556]: 0:0:22:0: Worker [779] processing SEQNUM=3014 is taking a long time
I have another node with half that many disks, which also spends quite some time looking at all the disks but gets done before a sysroot.mount
timeout happens, then imports its zpools and all gets fine at the end.
All I need is a longer timeout, I guess. Presumably on the order of 5min, before it considers emergency mode.
Impact
Can not get that node online with its data on the zpools.
Reproduction
installed 3975.2.0 successfully from USB stick using flatcar-install -d /dev/sda -i ignition.json
method
reboot and ignition completes successfully if /dev/sda
is the only disk connected, node comes up, joins, all fine ...
( a few disks, like 5 or 10 is probably fine too? - can not try as it would degrade the zpools)
Additional information
I am migrating the cluster from CentOS, which appears to have no problem with the slowness of the many disk devices (can't tell about earlier Flatcar versions).
Flatcar release booted without disks disconnected:
n04 ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3975.2.0
VERSION_ID=3975.2.0
BUILD_ID=2024-08-05-2103
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3975.2.0 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3975.2.0:*:*:*:*:*:*:*"