Corrupt mirrored pool + `VERIFY0(dmu_object_info(os, mapping_object, &doi)) failed (52)`
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Proxmox / Debian |
| Distribution Version | 9.1 / 13 (trixie) |
| Kernel Version | 6.17.2-1-pve |
| Architecture | x86_64 |
| OpenZFS Version | zfs-2.3.4-pve1 |
Describe the problem you're observing
I have a pool of two mirrored HDDs packed into USB-C enclosure named ppool (as in Portable Pool):
pool: ppool
id: 7387830995368866575
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:
ppool ONLINE
mirror-0 ONLINE
sdb1 ONLINE
sdc1 ONLINE
indirect-1 ONLINE
indirect-2 ONLINE
After some corruption the import reports "corrupted data" for the pool:
saukrs@s2-book:~$ sudo zpool import
pool: ppool
id: 7387830995368866575
state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:
ppool FAULTED corrupted data
mirror-0 FAULTED corrupted data
sdb1 ONLINE
sdc1 ONLINE
indirect-1 ONLINE
indirect-2 ONLINE
... and the import gets stuck here:
saukrs@s2-book:~$ sudo zpool import -f ppool
<no return>
This was OpenZFS v2.1.5.
I retried importing it on OpenZFS v2.3.4 – mostly the same except that "FAULTED corrupted data" is gone (see the first snippet, please) + some PANIC is received soon from the kernel:
VERIFY0(dmu_object_info(os, mapping_object, &doi)) failed (52)
PANIC at vdev_indirect_mapping.c:349:vdev_indirect_mapping_open()
INFO: task zpool:1507 blocked for more than 122 seconds
Describe how to reproduce the problem
No idea. I was using it on Mint 21 (based on Ubuntu 22.04, OpenZFS v2.1.5) for 10 months or so, and have written more than 4T successfully into it.
Recently I wrote several tens of gigabytes on Windows 10 (OpenZFS v2.3.1 rc13) just fine too.
After some glitch/jerk of the USB connection, my Windows laptop lost access to the pool, and after physical disconnection, reboot and the reconnection it then failed to import the pool (even blue-screened after a while).
I left the drives attached to my laptop and rebooted Windows, only to find that it tries to import the pool and so gets BSOD again.
After several iterations of the reboot loop I disconnected the pool from my Windows laptop and tried to import ir back on my Linux laptop, and it failed (like described initially).
The most recent test was done on Proxmox 9.1 with OpenZFS v2.3.4, please see the recorded Asciinema snippet: https://asciinema.org/a/757533
Include any warning/errors/backtraces from the system logs
I tried stracing zpool import:
...
close(5) = 0
openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:1c.7/0000:25:00.0/usb2/2-1/2-1:1.0/host7/target7:0:0/7:0:0:1/block/sdc/sdc1/uevent", O_RDONLY|O_NOCTTY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
read(5, "MAJOR=8\nMINOR=33\nDEVNAME=sdc1\nDE"..., 4104) = 143
close(5) = 0
openat(AT_FDCWD, "/run/udev/data/b8:33", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=3377, ...}) = 0
fstat(5, {st_mode=S_IFREG|0644, st_size=3377, ...}) = 0
read(5, "S:disk/by-path/pci-0000:25:00.0-"..., 4096) = 3377
read(5, "", 4096) = 0
close(5) = 0
brk(0x5e4895929000) = 0x5e4895929000
ioctl(3, ZFS_IOC_POOL_STATS, 0x7ffeda96d180) = -1 ENOENT (No such file or directory)
brk(0x5e4895959000) = 0x5e4895959000
ioctl(3, ZFS_IOC_POOL_TRYIMPORT, 0x7ffeda96d220) = 0
openat(AT_FDCWD, "/proc/sys/kernel/spl/hostid", O_RDONLY|O_CLOEXEC) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(5, "f9c338f9\n", 1024) = 9
close(5) = 0
ioctl(3, ZFS_IOC_POOL_IMPORT
<no return>
Also I checked the recent dmesg messages:
[Sat Nov 22 12:01:59 2025] PANIC at vdev_indirect_mapping.c:349:vdev_indirect_mapping_open()
[Sat Nov 22 12:01:59 2025] Showing stack for process 1689
[Sat Nov 22 12:01:59 2025] CPU: 0 UID: 0 PID: 1689 Comm: zpool Tainted: P S O 6.17.2-1-pve #1 PREEMPT(voluntary)
[Sat Nov 22 12:01:59 2025] Tainted: [P]=PROPRIETARY_MODULE, [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
[Sat Nov 22 12:01:59 2025] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.65 04/06/2017
[Sat Nov 22 12:01:59 2025] Call Trace:
[Sat Nov 22 12:01:59 2025] <TASK>
[Sat Nov 22 12:01:59 2025] dump_stack_lvl+0x5f/0x90
[Sat Nov 22 12:01:59 2025] dump_stack+0x10/0x18
[Sat Nov 22 12:01:59 2025] spl_dumpstack+0x28/0x40 [spl]
[Sat Nov 22 12:01:59 2025] spl_panic+0xef/0x114 [spl]
[Sat Nov 22 12:01:59 2025] vdev_indirect_mapping_open+0x16b/0x1a0 [zfs]
[Sat Nov 22 12:01:59 2025] spa_remove_init+0xa6/0x1f0 [zfs]
[Sat Nov 22 12:01:59 2025] spa_load+0x317/0x1a90 [zfs]
[Sat Nov 22 12:01:59 2025] spa_load_best+0x18e/0x2e0 [zfs]
[Sat Nov 22 12:01:59 2025] spa_import+0x22a/0x6c0 [zfs]
[Sat Nov 22 12:01:59 2025] zfs_ioc_pool_import+0x153/0x170 [zfs]
[Sat Nov 22 12:01:59 2025] zfsdev_ioctl_common+0x7c2/0x970 [zfs]
[Sat Nov 22 12:01:59 2025] zfsdev_ioctl+0x57/0xf0 [zfs]
[Sat Nov 22 12:01:59 2025] __x64_sys_ioctl+0xa5/0x100
[Sat Nov 22 12:01:59 2025] x64_sys_call+0x1151/0x2330
[Sat Nov 22 12:01:59 2025] do_syscall_64+0x80/0xa30
[Sat Nov 22 12:01:59 2025] ? count_memcg_events+0xd7/0x1a0
[Sat Nov 22 12:01:59 2025] ? mod_memcg_lruvec_state+0xd3/0x1f0
[Sat Nov 22 12:01:59 2025] ? __lruvec_stat_mod_folio+0x8b/0xf0
[Sat Nov 22 12:01:59 2025] ? set_ptes.isra.0+0x3b/0x90
[Sat Nov 22 12:01:59 2025] ? do_anonymous_page+0x106/0x990
[Sat Nov 22 12:01:59 2025] ? ___pte_offset_map+0x1c/0x180
[Sat Nov 22 12:01:59 2025] ? __handle_mm_fault+0xb55/0xfd0
[Sat Nov 22 12:01:59 2025] ? sched_clock_noinstr+0x9/0x10
[Sat Nov 22 12:01:59 2025] ? count_memcg_events+0xd7/0x1a0
[Sat Nov 22 12:01:59 2025] ? handle_mm_fault+0x254/0x370
[Sat Nov 22 12:01:59 2025] ? do_user_addr_fault+0x2f8/0x830
[Sat Nov 22 12:01:59 2025] ? irqentry_exit_to_user_mode+0x2e/0x290
[Sat Nov 22 12:01:59 2025] ? irqentry_exit+0x43/0x50
[Sat Nov 22 12:01:59 2025] ? exc_page_fault+0x90/0x1b0
[Sat Nov 22 12:01:59 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Sat Nov 22 12:01:59 2025] RIP: 0033:0x7726a93668db
[Sat Nov 22 12:01:59 2025] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d
00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Sat Nov 22 12:01:59 2025] RSP: 002b:00007ffeda96c1e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Sat Nov 22 12:01:59 2025] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007726a93668db
[Sat Nov 22 12:01:59 2025] RDX: 00007ffeda96c2a0 RSI: 0000000000005a02 RDI: 0000000000000003
[Sat Nov 22 12:01:59 2025] RBP: 00007ffeda970990 R08: 0000000000000000 R09: 00007726a94422b0
[Sat Nov 22 12:01:59 2025] R10: 00007726a94422b0 R11: 0000000000000246 R12: 00005e48958e4ab0
[Sat Nov 22 12:01:59 2025] R13: 00007ffeda96c2a0 R14: 00005e48958f5060 R15: 00005e4895909480
[Sat Nov 22 12:01:59 2025] </TASK>
Before the bug gets being fixed, I would prioritize very much about the current pool analysis (using zdb probably?) to see if the data is recoverable, please, since I went short on space due to this incident... Thanks in advance.
According to indirect-X vdevs reported, at some point you've used vdev removal feature. But it seems somehow the mapping table from those removals got corrupted, and the code does not handle the errors there other than with kernel panics. This sure should better be fixed, but it may not exactly help your case if some blocks are still referenced via the indirect vdevs, if the mapping tables are lost. Traditionally for data recovery it is recommended to use read-only import, but I am not sure it help in this particular case.
According to
indirect-Xvdevs reported, at some point you've used vdev removal feature. But it seems somehow the mapping table from those removals got corrupted
OK, I might have done this when I started with ZFS and was building this two-way mirrored pool. But it was so long ago that I forgot the details.
Is there any tech reference to read about these mapping tables (involved into vdev removal) ?
it may not exactly help your case if some blocks are still referenced via the indirect vdevs, if the mapping tables are lost.
I wonder about their structure, amount of metadata they store, and the chances to reconstruct them by hand.
Can their current state be dumped using zdb [...] -e ppool maybe?
Traditionally for data recovery it is recommended to use read-only import, but I am not sure it help in this particular case.
IIRC, I tried zpool import -f -o readonly=on -F ppool with no UX changes.
I am yet to try --rewind-to-checkpoint. I'd would also try -X but I'm unsure if it works in read-only mode.
OK, since I received a memory.dmp of this panic when testing with Windows version, I can add my observations;
VERIFY0(dmu_object_info(os, mapping_object, &doi));
OpenZFS!spl_panic+0x71 [C:\src\openzfs\module\os\windows\spl\spl-err.c @ 87] C/C++/ASM
> OpenZFS!vdev_indirect_mapping_open+0x94 [C:\src\openzfs\module\zfs\vdev_indirect_mapping.c @ 348] C/C++/ASM
OpenZFS!spa_remove_init+0x42c [C:\src\openzfs\module\zfs\vdev_removal.c @ 609] C/C++/ASM
OpenZFS!spa_ld_open_indirect_vdev_metadata+0x2c [C:\src\openzfs\module\zfs\spa.c @ 4556] C/C++/ASM
OpenZFS!spa_load_impl+0x21e [C:\src\openzfs\module\zfs\spa.c @ 5475] C/C++/ASM
OpenZFS!spa_load+0x95 [C:\src\openzfs\module\zfs\spa.c @ 3486] C/C++/ASM
OpenZFS!spa_load_retry+0x92 [C:\src\openzfs\module\zfs\spa.c @ 5719] C/C++/ASM
OpenZFS!spa_load_best+0x37c [C:\src\openzfs\module\zfs\spa.c @ 5800] C/C++/ASM
OpenZFS!spa_import+0x301 [C:\src\openzfs\module\zfs\spa.c @ 6824] C/C++/ASM
OpenZFS!zfs_ioc_pool_import+0x13f [C:\src\openzfs\module\zfs\zfs_ioctl.c @ 1554] C/C++/ASM
OpenZFS!zfsdev_ioctl_common+0x79d [C:\src\openzfs\module\zfs\zfs_ioctl.c @ 8122] C/C++/ASM
OpenZFS!zfsdev_ioctl+0x27d [C:\src\openzfs\module\os\windows\zfs\zfs_ioctl_os.c @ 886] C/C++/ASM
OpenZFS!ioctlDispatcher+0x26e [C:\src\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 7772] C/C++/ASM
OpenZFS!dispatcher+0x2c0 [C:\src\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 8900] C/C++/ASM
_verify0_right == 0x32
include/os/windows/spl/sys/errno.h:#define EBADE 50 /* invalid exchange */
#define ECKSUM EBADE
So it would seem to be what @amotin said, it is trying to read the indirect_mapping table and get an invalid checksum. I wonder though if it wouldn't be nicer to bubble up a read-error instead of panic in this situation. It does seem that indirect_mapping does no effort to work around errors. At least with the idea of being able to recover "some data"