zfs
zfs copied to clipboard
ZFS attempts to write into a pool being imported as RO
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Arch |
| Distribution Version | Current |
| Kernel Version | 5.10.63 |
| Architecture | x64 |
| OpenZFS Version | 2.1.0 |
Describe the problem you're observing
When attempting to import a pool with zpool import -o ro <name>, the kernel crashes on a VERIFY fail when calling dmu_write on the RO-importing pool (import never completes due to the crash, system hangs, requires full reboot):
[ 2405.754541] VERIFY3(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 5)
[ 2405.754551] PANIC at dmu.c:1117:dmu_write()
[ 2405.754556] Showing stack for process 133343
[ 2405.754561] CPU: 0 PID: 133343 Comm: txg_sync Tainted: P U OE T 5.10.63 #1
...
[ 2405.754565] Call Trace:
[ 2405.754581] [<ffffffff81ef2ff2>] dump_stack+0x62/0x88
[ 2405.754587] [<ffffffff81ef25e6>] spl_panic+0xc3/0xee
[ 2405.754593] [<ffffffff8141baa2>] ? dbuf_destroy+0x302/0x4c0
[ 2405.754600] [<ffffffff812cf440>] ? kfree+0x100/0x460
[ 2405.754606] [<ffffffff8142900d>] ? dmu_buf_hold_array_by_dnode+0x1bd/0x620
[ 2405.754610] [<ffffffff81f15c21>] ? _cond_resched+0x11/0x50
[ 2405.754615] [<ffffffff81f17b29>] ? mutex_lock+0x9/0x30
[ 2405.754620] [<ffffffff814486a0>] ? dnode_rele+0x70/0xe0
[ 2405.754624] [<ffffffff8142998b>] dmu_write.part.0+0x1bb/0x1c0
[ 2405.754629] [<ffffffff81f15c21>] ? _cond_resched+0x11/0x50
[ 2405.754635] [<ffffffff814c6c8d>] space_map_write+0x13d/0x8f0
[ 2405.754640] [<ffffffff812d2e7c>] ? kmem_cache_alloc_node+0x16c/0x410
[ 2405.754647] [<ffffffff8114bd83>] ? ktime_get_raw_ts64+0x43/0xd0
[ 2405.754651] [<ffffffff81f15c21>] ? _cond_resched+0x11/0x50
[ 2405.754655] [<ffffffff81f17b29>] ? mutex_lock+0x9/0x30
[ 2405.754660] [<ffffffff81493406>] metaslab_sync+0x6f6/0x970
[ 2405.754664] [<ffffffff81f15c21>] ? _cond_resched+0x11/0x50
[ 2405.754667] [<ffffffff81f15c21>] ? _cond_resched+0x11/0x50
[ 2405.754673] [<ffffffff814d3f73>] vdev_sync+0x83/0x5f0
[ 2405.754678] [<ffffffff814803c1>] ? dsl_scan_sync+0x91/0x11b0
[ 2405.754683] [<ffffffff814bd5e7>] ? spa_flush_metaslabs+0x27/0x360
[ 2405.754688] [<ffffffff814aa1d2>] spa_sync+0x5e2/0xf70
[ 2405.754696] [<ffffffff814c47f5>] ? spa_txg_history_init_io+0xf5/0x100
[ 2405.754700] [<ffffffff814c9898>] txg_sync_thread+0x2b8/0x470
[ 2405.754706] [<ffffffff814c95e0>] ? txg_fini+0x260/0x260
[ 2405.754713] [<ffffffff813831c2>] thread_generic_wrapper+0x72/0x90
[ 2405.754719] [<ffffffff81383150>] ? __thread_exit+0x10/0x10
[ 2405.754724] [<ffffffff810cc7ba>] kthread+0x13a/0x170
[ 2405.754730] [<ffffffff810cc680>] ? kthread_park+0x90/0x90
[ 2405.754737] [<ffffffff81005bac>] ret_from_fork+0x2c/0x40
Clearly something is wrong with the pool, but given that i need some data out of it before destroying and replacing, this is somewhat of a problem. The failing function call is:
1107 void
1108 dmu_write(objset_t *os, uint64_t object, uint64_t offset, uint64_t size,
1109 const void *buf, dmu_tx_t *tx)
1110 {
1111 dmu_buf_t **dbp;
1112 int numbufs;
1113
1114 if (size == 0)
1115 return;
1116
1117 VERIFY0(dmu_buf_hold_array(os, object, offset, size,
1118 FALSE, FTAG, &numbufs, &dbp));
1119 dmu_write_impl(dbp, numbufs, offset, size, buf, tx);
1120 dmu_buf_rele_array(dbp, numbufs, FTAG);
1121 }
according to the dmesg output.
My guess is that like so many messed up filesystem journaling semantics, the ZIL is trying to catch-up despite being imported RO.
This is worse than i thought: zpool import -F -n - a dry run of a forced import still produces the same exact crash.
Same deal with -m passed, even though it never had a SLOG, figured it might discard the on-disk SLOG. No dice.
@behlendorf - from a forensics standpoint, this might be a rather serious problem: if someone were to keep CSAM on their zpool and the bureau got their hands on the pool, their forensics people would be unable to examine a preserved forensic state to convict the perpetrator, or the defense counsel could claim that its fabricated evidence because it was not presented to the court in the same state in which it was acquired during search & seizure (with knock-on implications for anything found from that branch of the investigation).
Of note: the ZFS pool which went sour here was created in December of 2019 with whatever the last tag was at that time. The pool itself had never been upgraded to recent revisions, its been running fine since then until Monday.
I've crossed into some twilight zone nightmare apparently.
The pool in question had been backed up - rather the underlying block media had been backed up some time ago.
Having mounted and accessed the backup on my host just right now, i DD'd the blockdev data to a thumb drive (its a dmcrypt vol with unencrypted partitions for boot, efi, and an MBR block), and attempted to boot the just-verified rpool. To my chagrin, now THIS ONE (which is booting itself on 5.10.63 with zfs 2.1 since i updated all the packages in it during my verification pass) is throwing the same error (though referencing a different line of code for the same call):

Confounding since the pool was just imported on the same exact kernel (ZFS is built into the kernel, not a module).
Whatever is wrong with the pool appears to have happened prior to the backup, which is odd, as the error was ignored on subsequent boots (likely something added since that revision which catches the bug).
When attempting to import a pool with zpool import -o ro
, the kernel crashes on a VERIFY fail when calling dmu_write()
Try importing the pool with zpool import -o readonly=on <name>. Using -o ro when importing the pool only results in the filesystems being mounted readonly, the pool itself is still imported readwrite. This is why the dmu_write() code is still being run. This really should be more clearly documented, it's in the man page but the distinction is subtle and the syntax is perhaps a bit unfortunate.
-o mntopts
Comma-separated list of mount options to use when mounting
datasets within the pool. See zfs(8) for a description of
dataset properties and mount options.
-o property=value
Sets the specified property on the imported pool. See the
zpoolprops(7) manual page for more information on the avail‐
able pool properties.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.