zfs ZPOOL import should not crash kernel/system on a corrupted zpool

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	24.04
Kernel Version	6.6.54-x64v4-xanmod1
Architecture	x86_64
OpenZFS Version	zfs-2.2.99-880_g412105977

Describe the problem you're observing

Note: the zpool was corrupted due to ram overclocking and unstable ram. zpool was of 4x nvme without raidz protection.

Restarting the system [ram in stable mode] and trying to zpool import monster will cause the following panic in zfs and may lock up entire system depending if running on 6.6.54 or 6.8.0-45-generic (ubuntu release) kernel. On 6.8, it will freeze the entire system. On 6.6.54, other parts of kernel are still functioning post panic but any future zfs/zpool operation will stall.

zpool import monster

Expectation

zpool import to not suffer kernel level panic that brings down the system. zfs import should simply fail with message and not cause whole system lock-up.

[   54.896980] Call Trace:
[   54.896982]  <TASK>
[   54.896984]  dump_stack_lvl+0x49/0x60
[   54.896988]  vcmn_err+0xc6/0x100 [spl]
[   54.896995]  ? bt_grow_leaf+0x155/0x160 [zfs]
[   54.897041]  ? bt_grow_leaf+0x155/0x160 [zfs]
[   54.897071]  ? zfs_btree_insert_into_leaf+0x221/0x320 [zfs]
[   54.897101]  zfs_panic_recover+0x68/0x90 [zfs]
[   54.897145]  range_tree_remove_impl+0x871/0xf30 [zfs]
[   54.897182]  space_map_load_callback+0x1d/0x80 [zfs]
[   54.897216]  space_map_iterate+0x185/0x400 [zfs]
[   54.897246]  ? spa_stats_destroy+0x1d0/0x1d0 [zfs]
[   54.897276]  space_map_load_length+0x5c/0xd0 [zfs]
[   54.897306]  metaslab_load+0x169/0x990 [zfs]
[   54.897340]  metaslab_activate+0x36/0x270 [zfs]
[   54.897370]  ? metaslab_set_selected_txg+0x7a/0xb0 [zfs]
[   54.897400]  metaslab_alloc_dva+0x85b/0x12f0 [zfs]
[   54.897431]  metaslab_alloc+0xd9/0x290 [zfs]
[   54.897461]  zio_dva_allocate+0xc0/0x9b0 [zfs]
[   54.897505]  ? kmem_cache_free+0x19/0x320
[   54.897506]  ? spl_kmem_cache_free+0x128/0x1e0 [spl]
[   54.897510]  ? zio_io_to_allocate+0x5e/0x80 [zfs]
[   54.897544]  zio_execute+0x7b/0x120 [zfs]
[   54.897574]  taskq_thread+0x2ec/0x640 [spl]
[   54.897578]  ? wake_up_state+0x10/0x10
[   54.897580]  ? zio_vdev_io_done+0x210/0x210 [zfs]
[   54.897610]  ? taskq_thread_spawn+0x60/0x60 [spl]
[   54.897613]  kthread+0xdc/0x110
[   54.897614]  ? kthread_complete_and_exit+0x20/0x20
[   54.897615]  ret_from_fork+0x28/0x40
[   54.897617]  ? kthread_complete_and_exit+0x20/0x20
[   54.897617]  ret_from_fork_asm+0x11/0x20
[   54.897619]  </TASK>

Oct 09 '24 09:10 Qubitium

don't see ZFS at fault here

ram overclock
non-stable version
mismatch version zfs 2.2.99 vs kmod 2.3.99
non standard kernel: xanmod

nothin zfs can be blamed for - shot be closed as invalid

Oct 09 '24 19:10 n0xena

don't see ZFS at fault here

I disagree. Code crash/fail/exception is normal. Locking up system and locking up all future zfs operations is not normal, on any branch of development. I can understand if this is a runtime corruption but this is a static file system load on zfs import that should never crash, period. It can error, but should never crash.

ram overclock

I never blamed zfs for the corruption.

non-stable version

Code committed. Reviewed. The crash happened during zfs import, file system init/load. It should not crash internally and lock-up system and lock up entire zfs/zpool. No zfs operations are possible after this crash. Imagine you have a zfs file system that is corrupt syst you cannot even repair/destroy them since the second the code sniffs at the zpool on load, it crashes and no other command can be entered on the system since it's a full kernel lock (6.8 kernel) or zfs kernel level lockup (6.6 kernel)

mismatch version zfs 2.2.99 vs kmod 2.3.99

This is my bad but irrelevant here. I was testing this crash on two kernels, several commits apart on the tip. I don't think the pushed 2.3.99 commits did anything special or behaved differently. Both the 2.2.99 (6.6.54 kernel) and 2.3.99 (6.8. ubuntu kernel) kmod crashed at the exact same spot with same stack trace with same zpool on zfs import.

non standard kernel: xanmod

So we have a standard on kernels now? Why? Did xanmod modify the api that I am not aware of? 6.8.0 kernel is also the Ubuntu standard kernel. Both standard and non-standard kernel tested.

nothin zfs can be blamed for - shot be closed as invalid

I wouldn't be so quick to dismiss a system lock up level, zfs kernel lock up level, crash on a unstable branch of a file-system code that is not doing file i/o but doing file system init/load.

Oct 10 '24 01:10 Qubitium

to put it this way: you encountered an issue with a version of zfs that's not released as tested and stable - your system config screwed up the pool - and now you blame zfs for some failure to me this looks like little to no effort was put into figure out if zfs is the root cause or your system setup

kernel the big distributions all come with a couple of different kernel versions - mostly "standard" (often just called "linux" without suffix), "LTS" and some special ones like "hardened" or "xen" or similar you gave: "xanmod" - so however it differs from debian/ubuntu "standard" kernel its name already has it in it: "mod" - so it's at least some modificiation as the kernel is the base of any system different configurations of enabled or disabled features and configs can cause quite different codepaths - maybe xanmod has additional patches or reverts of commits in any way: it's not what any dev can be expect on a "regular" install of ubuntu to rule out the different kernel is the cause your issue has to be retested with the "stock" kernel installed

mismatching version if you would use any of the released version your zfs version would relfect that - yet it shows .99 - which is the marker for "straight from git master" - that's not how it's supposed to be used and at least the exact time of clone or commit has to be given this way noone knows which commit you're using so - retest with 2.2.6 or 2.3.0-rc1

hardware malfunction I just quoted your report: ram overclocking caused the issue due to instability to put it straight: zfs is not meant to be used in such environment - if, for some reason, you run your hardware out of spec and have software fail on it you can't blame the software! software is written with the deterministic default behaviour of the target platform in mind - "bending" this by run hardware out of spec is solely your fault which makes this entire report completely invalid to figure out if the software - the combination of the stock linux kernel + the default set of gnu tools for a minimal setup + zfs - is the cause here you have to retest your issue with such a clean setup

sure - it should be mentioned that with some random commit in a mismatched combination of zfs-utils + kernel module there's a lockup due to a corrupted pool - but how is anyone supposed to evaluate this without first clean up all those issues with your system to check if it's the f*ed up pool (rather likely) causing some bad codeways without proper error handling?

it's one of the basics of troubleshooting: does my system meet the specs set by the dev? yours doesn't - so first you have to sort that out and then retest - and if the issue still persist then you might file a report - although it will still be difficult without a clone of the now corrupted pool

tldr: you screwed up and now you blame others - that's why I wrote "don't see zfs at fault here" - fix your setup first and retest proper

Oct 10 '24 21:10 n0xena

@n0xena You have no idea what you are talking about.

Even if I give zfs a fake zfs partition with 100% corrupted, non zfs bits, "zfs import" should not crash and bring down the 6.8.0 ubuntu official kernel and the entire system. This is the bug, not the corruption itself.

UPDATE: I updated title to be extra clear.

Oct 10 '24 21:10 Qubitium

Yeah, you're right. Unfortunately, right now it's a fairly common pattern within OpenZFS to just panic when something strange happens, which in kernel context, takes the whole kernel down.

The theory, such as there is, is that its better to halt proceedings hard rather than go on and risk further damage. But its a theory that goes back to old Solaris, where a kernel debugger was available at the crash and the hope was that someone could inspect the system and effect a repair.

For many, including this specific one, you can set zfs_recover=1 in module parameters, and it will still yell but not panic. That's not really a good idea though, because the bad data may just trip further problems, or get memory corruption, or worse. It's mostly often used in conjunction when importing readonly to get the pool up just enough to be able to get data off.

But all up, yes, you're right - it would be great if we could fail gracefully at all points. It would be a big project to identify and all of these, but worth doing. I've got a bunch of related background projects in flight around these sorts of things, but very little time for most of them. So it goes.

Thanks for taking the time to write it down though.

Oct 10 '24 22:10 robn

@robn Thank you the diagnosis. I understand there is no good choice between crash vs no-crash during runtime detected io corruption for whatever reason. It is a poison pill either way. File system integrity comes first.

For many, including this specific one, you can set zfs_recover=1 in module parameters, and it will still yell but not panic. That's not really a good idea though, because the bad data may just trip further problems, or get memory corruption, or worse. It's mostly often used in conjunction when importing readonly to get the pool up just enough to be able to get data off.

With that said, I think a good high(er) priority, for purely usability reasons is that if during filesystem mount stage aka zfs import, at this before user generated i/os can happen, we need to make sure it doesn't bring down the system. Mine, albeit naive expectation, is that: I am asking zfs to load a file system. I am not doing any read/write i/o on it. Why is this part not crash preventable?

Perhaps at this zfs import stage zfs_recover=1 is auto enabled and if it fails, fail zfs import, drop the import, and not proceed further? I don't know if this the least resistant path while not violate any existing zfs principles on data integrity. Once zfs import is complete, revert the zfs_recover internally zfs module to false.

Oct 12 '24 05:10 Qubitium

The issue is cost. It will require developer time to address these and other data inconsistency crashes in ZFS. Those problems are out of scope for most sponsored work, where ZFS is supposed to be run in stable environment with ecc ram. Volunteers have limited time and have to prioritize what they work on.

Oct 21 '24 06:10 IvanVolosyuk

Pretty much this.

I'm personally interested in this sort of work, because it usually makes things generally more robust, beyond just the panic points. If nothing else, because a whole bunch of code suddenly has to deal with something failing, rather than just assuming success or death. But for sure it's a slog, and a lot of it isn't just brainless filing down sharp edges - some of it needs real thought, and that's hard to muster at the end of a long day.

It'll improve slowly over time, as most things do. Throwing serious money at it can make it go much faster, while small donations of money, gifts and snacks can make it go a little faster, or at least, a little happier. I respond well to both kinds, links in bio, ama :)

Oct 21 '24 07:10 robn