zfs icon indicating copy to clipboard operation
zfs copied to clipboard

ZFS 2.2.0 with kernel >=6.5.8 fails to boot (clang+lld+lto=thin)

Open fsvm88 opened this issue 1 year ago • 13 comments

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version rolling
Kernel Version >=6.5.8
Architecture x86-64
OpenZFS Version 2.2.0

Describe the problem you're observing

ZFS 2.2.0 won't allow kernels >=6.5.8 to boot. The kernel fails at loading the ZFS module with the attached Oops and Panic.

ZFS 2.2.0 with kernel 6.5.7 works fine.

Describe how to reproduce the problem

My setup is quite exotic (clang + lld + LTO=thin), but has worked fine until kernel 6.5.7. Kernel 6.5.7 with ZFS 2.2.0 boots fine (please see my kernel config in the attachments).

Compiling ZFS 2.2.0 against a kernel >= 6.5.8 Oops and Panics at boot.

I dug through the kernel Changelog for 6.5.8, but couldn't find anything obvious that can explain the boot failure (I most likely missed it).

It does not seem to be a problem with the compiler or the linker, as I am running Clang 17.0.3 on my desktop and Clang 16.0.6 on my server, and both failed the same way. Also, my desktop's CPU vendor is AMD and my server's is Intel.

I have downgraded to 2.1.13 for the time being, but can test patches easily if required.

Include any warning/errors/backtraces from the system logs

These are the Oops and Panic recovered from persistent store after the machine rebooted: merged_oops.txt merged_panic.txt

This is my kernel config: 6.5.9 Kernel config

fsvm88 avatar Oct 27 '23 22:10 fsvm88

That stack seems somewhat wild, which makes me wonder if you're smashing the stack here somehow.

I don't know that I'll have time to try it myself anytime soon, but I would suggest that trying kASAN might be a start for figuring out what's going on, since that seems to be faulting on an illegal address reference.

rincebrain avatar Oct 27 '23 23:10 rincebrain

I gave kASAN a try, but I don't think it helped much. Note that kASAN forces LTO=off, but that didn't change the outcome. merged_oops_kasan.txt merged_panic_kasan.txt

I'll try to bisect 6.5.7 vs 6.5.8 and see if I can find the culprit.

fsvm88 avatar Oct 28 '23 11:10 fsvm88

So, after spending >24h of compiling kernels... kernel 6.5.8 is not the problem. Bisection pointed out a totally unrelated commit, which did not fix the issue when reverted.

I attempted to recompile 6.5.7 to validate that the kernel version was the issue, but I believe at this point that 6.5.7 was really running on 2.1.13 from the initramfs, and running modinfo /lib/modules/6.5.7.../zfs.ko returned the installed module, not the one embedded in the initramfs (and loaded at boot). I have ~amd64 for sys-fs/zfs and sys-fs/zfs-kmod, which was probably upgraded without me noticing that this was not really the running version, until I recompiled 6.5.8.

So it seems that ZFS 2.2.0 and kernels >=6.5.0 (perhaps only with clang+lld) are having issues.

I recalled that when upgrading to 6.5.0 there were a few issues related to Clang/LLD/LTO configs ->

  • https://github.com/ClangBuiltLinux/linux/issues/1907
  • https://github.com/ClangBuiltLinux/linux/issues/1909
  • https://github.com/ClangBuiltLinux/linux/issues/1911

Perhaps this can shed some light on the failures?

fsvm88 avatar Oct 29 '23 14:10 fsvm88

As a sincere question, how are you building ZFS here?

I've tried building it on a kernel with ThinLTO, and it fails at the configure check for modules enabled with an error I don't remotely understand that seems like some deep assumption is broken.

rincebrain avatar Oct 29 '23 15:10 rincebrain

Ah, cute, on my Debian system it tries invoking ld.lld with a thinlto cachedir of in the headers directory, which doesn't fly. A quick hacky workaround for that later, and I'm rolling with a ThinLTO-built 6.5.9 from Clang 13 and 2.2.0, and no explosions even loading the module on boot.

I'll try 16 next, probably.

rincebrain avatar Oct 29 '23 17:10 rincebrain

Ah yes, sorry, my update script for Gentoo does include chmodding the LTO cache folder ->

modprobe configs
export KERN_CONFIG_DUMP=$(uname -r)
zcat /proc/config.gz > $KERN_CONFIG_DUMP
genkernel --kernel-config=${KERN_CONFIG_DUMP} --no-module-rebuild --zfs kernel
chmod 0777 /usr/src/linux/.thinlto-cache
emerge -1 zfs-kmod @module-rebuild --quiet
genkernel --kernel-config=${KERN_CONFIG_DUMP} --zfs initramfs

Thanks to a chat with a good friend of mine, I put together a QEMU script to ease the pain of compile+test a bit. I can boot to the rescue shell in QEMU, but as soon as I 'modprobe zfs`, I get a very similar result (I can see only the end of the stack trace, it's always the same function).

I tried to workaround the offending code (which seems related to blake3 algo) by supplying zfs_blake3_impl={generic,sse41} at modprobe, but the result stays the same.

#!/bin/bash

[ $# -lt 2 ] &&
    echo >&2 "ERROR: at least kernel and initrd required: $0 kernel_fullpath initrd_fullpath [kernel_cmdline]" &&
    exit 1

kernel="${1}"
initrd="${2}"
shift 2

qemu-system-x86_64 \
    -cpu host \
    -enable-kvm \
    -smp $(nproc) \
    -boot menu=off \
    -boot order=c \
    -kernel "${kernel}" \
    -initrd "${initrd}" \
    -append "$*" \
    -m 512m \
    -serial stdio \
    -vga std

fsvm88 avatar Oct 29 '23 22:10 fsvm88

I would suspect the reason is that it does a benchmark of all the checksum options it thinks are supported no matter which one you tell it to use, so it's triggering whatever is breaking the same way.

If you could share an easy reproduction method, or an easy way to generate said VM, that'd be useful, because I tried building with clang 16 and thinlto and it worked fine for my kernel config...

rincebrain avatar Oct 30 '23 04:10 rincebrain

RE benchmarking: suspected the same, yes.

The script doesn't really need a VM -> you call it as ./qemu.sh /boot/vmlinuz-.... /boot/initramfs-.... and just boot via QEMU your last-compile attempt (no disk/ISO/persistent FS/...). Since the failure happens during the ZFS module initialization, you don't need persistency for this.

I'm not sure how much time I'll be able to spend on this during the week, but I'll try to work out a smaller reproducer:

  • try to compile ZFS as built-in to remove the time spent creating the initramfs, check if the error persists
  • reduce config to a minimum failing one to speed up recompiles
  • try to isolate specific config options (esp. hardening options) that may make it boot

I'll update you as soon as I have news.

fsvm88 avatar Oct 30 '23 09:10 fsvm88

I managed to do 1) -> built-in ZFS (git) -> still same failure.

Once 1) was done, compiling the kernel took ~5m, so I didn't do 2) and optimize further. I enabled the text console support to redirect from QEMU and have more readable output on the terminal, and indeed it's the very same stack trace with the QEMU-booted kernel.

Did 3) and again thanks to my friend's suggestion and also this bug I decided to disable CFI for the first attempt (CONFIG_CFI_CLANG) and... it worked! Now the QEMU-booted kernel complains about not being able to mount root, which means that ZFS module finished loading.

This happens despite CONFIG_CFI_PERMISSIVE being set, which should only report warnings, and not trigger oopses and panics.

I have no idea how the kernel crypto code is annotated and how it works with CFI, but I believe at this point that the blake3 asm code needs to be annotated like the sha2 asm code.

fsvm88 avatar Oct 31 '23 20:10 fsvm88

Mildly terrifying. Happen to have those CFI reports from dmesg?

sempervictus avatar Nov 06 '23 14:11 sempervictus

Hi @sempervictus , agree.

All I get are the oops and panics attached in the first and third comments.

Those are obtained with CONFIG_CFI_CLANG=y and CONFIG_CFI_PERMISSIVE=y already.

You can grab my kernel config from the initial post in the issue.

Consider that thanks to Gentoo patches I'm compiling the kernel with -march=native as well, so perhaps that plays a role and I can retest for that, but with the very same kernel config, flags and all I have zero issues with ZFS 2.1.13.

fsvm88 avatar Nov 06 '23 14:11 fsvm88

I target x64 v2 in our CI since that's the bare minimum of what we run at the company but have had trouble with this before esp when using grayskys patches for the same thin in kernel builds

sempervictus avatar Nov 06 '23 14:11 sempervictus

If you need CFI or LTO and do not need blake3 just yet, I wrote this patch to remove Blake3 from zfs-2.2.0. It is currently tested and working when building as an in-tree module [M]. It think an error will occur if built directly [*]. I'm running Linux 6.6.15 with it at the moment.

If you get an error about intptr_t being defined twice, see issue #15418 for workaround

Patch must be applied against newly cloned zfs-2.2.0 tag blake3Removal.patch

oven8Mitts avatar Feb 05 '24 04:02 oven8Mitts