Flatcar ZFS module/units fail to load on boot with Beta 4426.1.0

Description

Unlike the current Stable, Beta 4426.1.0 fails to load the ZFS kernel module on boot (when pool devices are available), resulting in ZFS units being skipped and no pools mounted.

Impact

ZFS pools won't be re-mounted automatically, which would be unexpected.

Environment and steps to reproduce

Given the following Butane config to create a ZFS pool and dataset on first boot:

variant: flatcar
version: 1.1.0

storage:
  files:
    - path: /etc/flatcar/enabled-sysext.conf
      mode: 0644
      contents:
        inline: |
          zfs

systemd:
  units:
    - name: format-zfs.service
      enabled: true
      contents: |
        [Unit]
        ConditionFirstBoot=1
        Before=first-boot-complete.target
        Wants=first-boot-complete.target
        [Service]
        Type=oneshot
        ExecStart=zpool create zdata /dev/vdb
        ExecStart=zfs create -o mountpoint=/zfs-test zdata/zfs-test
        [Install]
        WantedBy=multi-user.target

Compile to ignition.json and test using QEMU:

# Create an extra blank file device:
qemu-img create -f qcow2 zfs-disk.qcow2 100M

# Start Flatcar
flatcar_production_qemu.sh  -i ignition.json -- \
  -nographic \
  -drive file=zfs-disk.qcow2,if=virtio,format=qcow2

Both Stable and Beta 4426.1.0 will create the zdata pool with zfs-test dataset mounted at /zfs-test on first boot -- as the ZFS module is dynamically loaded when zfs/zpool commands are run. Can verify with:

zfs list
df /zfs-test

Now reboot the Flatcar VM:
- Stable will load the ZFS module automatically via udev seeing ZFS devices and ZFS units are run resulting in /zfs-test being mounted.
- Beta 4426.1.0 will fail loading the ZFS module with the following logged:

Oct 08 17:25:18 localhost (udev-worker)[1754]: vdb1: Process '/sbin/modprobe zfs' failed with exit.

And all ZFS units will be skipped due to unmet condition (ConditionPathIsDirectory=/sys/module/zfs), resulting in ZFS not being loaded and /zfs-data not being automatically mounted. You can verify with df /zfs-test or see the failures via journalctl |grep zfs Running a manual zfs list will trigger dynamic loading of the ZFS module and mount the dataset again.

Expected behavior

Beta should automatically start ZFS module and run units when ZFS devices are present.

Additional information

Please add any information here that does not fit the above format.

Oct 08 '25 18:10 Codelica

It's strange. It won't load on boot, and the error will reappear if you run sudo udevadm trigger -s block, but if you run /sbin/modprobe zfs manually, it works fine.

Oct 09 '25 10:10 chewi

Figured it out. It's the SystemCallFilter on the systemd-udevd.service unit. Adding this works around it:

/etc/systemd/system/systemd-udevd.service.d/syscall.conf

[Service]
SystemCallFilter=

We need to figure out what syscalls are required for doing our modules overlay trick.

Oct 09 '25 10:10 chewi

Thank you both for the report and investigation (and beta testing)!

I've just tested it and we need to add @mount syscalls to make it work (in addition to the ones already allowed in systemd-udevd.service @system-service @module @raw-io bpf)

Oct 09 '25 13:10 danzatt

It's probably cumulative, so I think you can just add @mount in a drop-in.

Oct 09 '25 13:10 chewi

I've just opened a PR: https://github.com/flatcar/scripts/pull/3367 I've done local testing and it seems to work, we should also add testcase.

Oct 13 '25 07:10 danzatt

Hi, maybe we are missing something that needs to happen after an upgrade?

coreos-prd1-mysql-px-c ~ # cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=4459.2.0
VERSION_ID=4459.2.0
BUILD_ID=2025-11-10-1432
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 4459.2.0 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:4459.2.0:*:*:*:*:*:*:*"
coreos-prd1-mysql-px-c ~ # ls -al /usr/lib/systemd/system/systemd-udevd.service.d/
total 2
drwxr-xr-x. 2 root root 34 Nov 10 15:32 .
drwxr-xr-x. 1 root root 44 Jan  1  1970 ..
-rw-r--r--. 1 root root 36 Nov 10 15:32 10-zfs.conf
coreos-prd1-mysql-px-c ~ #

The flatcar.conf file in /usr/lib/systemd/system/systemd-udevd.service.d/ from PR: https://github.com/flatcar/scripts/pull/3367 is missing and therefore we are missing our ZFS pools in production.

Do we need to run something after an upgrade to get that sysext up2date?

Thanks Rainer

Nov 14 '25 09:11 stumbaumr

The fix hasn't been backported to stable (or beta) yet. I will do that today for the next release.

Nov 14 '25 11:11 chewi

So we need to halt the roll out of the "stable" release, roll back to the previous version and disable auto update for ever - is this the correct way to move forward?

This contradicts https://www.flatcar.org/releases:

The Stable channel is intended for use in production clusters. Versions of Flatcar Container Linux have been tested as they move through Alpha and Beta channels before being promoted to stable.

Would it be safer for us to move to LTS for production systems?

Nov 14 '25 11:11 stumbaumr

This fix should have been backported before the last set of releases, and I am also disappointed that this didn't happen. Lesson learned.

Our test suite does cover ZFS for every release, but unfortunately, it didn't catch this particular issue. We will be extending the test suite so that it does.

I have now backported the fix to 4459 for the next stable and beta releases. We have decided to expedite these releases, so you should see them on Monday 24th.

I don't know exactly how you manage your deployments, but if you have already rolled back, you can pause automatic updates until then. If you don't want to wait that long, you can manually add the following at /etc/systemd/system/systemd-udevd.service.d/zfs-udevd-hotfix.conf before rebooting into the broken release. You can remove it after upgrading to the next release, but you don't have to.

[Service]
SystemCallFilter=@mount

You are free to switch to LTS if you'd prefer, but what you have seen is not the level of stability we strive for.

Nov 14 '25 12:11 chewi

Now I am just interested if we are the only ones affected by this issue using ZFS with Flatcar in production. If so we might just have to transition to btrfs because of the possible higher popularity...

Anyone else affected by this issue please use the thumbs up reaction button on this post.

Nov 14 '25 13:11 stumbaumr

@stumbaumr we aren't in production yet, but will be using ZFS. We do plan on having dev/staging servers here on beta though to keep an eye on things.

Nov 14 '25 16:11 Codelica

@Codelica the issue was caught in beta, but the process took to long and it was not fixed before the stable release got out. So your plan would have not helped you here... We used to run beta as well on a test cluster. But only by using it you can identify issues. We are just a small group of people - there is no one available to operate/fight systems on a beta level.

Nov 15 '25 13:11 stumbaumr