zfs icon indicating copy to clipboard operation
zfs copied to clipboard

2.3.2 causing kernel panic and I/O hangs, 2.3.1 works on same dataset

Open sbellon opened this issue 7 months ago • 18 comments

System information

Type Version/Name
Distribution Name Debian GNU/Linux
Distribution Version unstable
Kernel Version 6.12.17 / 6.12.25
Architecture x86_64
OpenZFS Version 2.3.1 / 2.3.2

Describe the problem you're observing

Accessing certain parts of the file system with ZFS 2.3.2 causes reproducibly kernel panics and I/O hangs, while it works apparently flawlessly with ZFS 2.3.1 and earlier.

I also reported to Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104724

Even if the diagnostic of my dataset being somehow corrupted is true

  1. I wonder why it apparently worked flawlessly with all earlier versions of ZFS for over 1.5 years
  2. I think ZFS should not react that harshly in case a dataset really is corrupt.

Describe how to reproduce the problem

Until May 1st I was using kernel 6.12.17 and ZFS 2.3.1, everything working fine.

On May 1st, I booted into kernel 6.12.25, still ZFS 2.3.1, everything working fine.

On May 2nd, the upgrade to ZFS 2.3.2 was installed, did not reboot, so zfs-kmod still on 2.3.1, everything working fine.

On May 5th, I rebooted, and from there on, system misbehaved strangely, e.g. I could not open a "fish" shell any more as any access to ~/.config and ~/.cache would reproducibly result in those I/O hangs.

Luckily I still have 6.12.17/2.3.1 available to boot into a fully working environment, booting into 6.12.25/2.3.2 immediately breaks with the above symptoms.

Include any warning/errors/backtraces from the system logs

Kernel panic:

kernel: PANIC: zroot: blkptr at ffffb9932045c080 has no valid DVAs
kernel: Showing stack for process 1931
kernel: CPU: 14 UID: 0 PID: 1931 Comm: z_wr_iss Tainted: P           OE      6.12.25-amd64 #1  Debian 6.12.25-1
kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1205 06/14/2023
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x5d/0x80
kernel:  vcmn_err.cold+0x54/0x7f [spl]
kernel:  zfs_panic_recover+0x79/0xa0 [zfs]
kernel:  zfs_blkptr_verify_log+0xba/0x190 [zfs]
kernel:  zfs_blkptr_verify+0x15a/0x5e0 [zfs]
kernel:  ? bp_get_dsize_sync+0x124/0x160 [zfs]
kernel:  dbuf_write_ready+0xf5/0x410 [zfs]
kernel:  arc_write_ready+0xe9/0x560 [zfs]
kernel:  ? mutex_lock+0x12/0x30
kernel:  zio_ready+0x4b/0x400 [zfs]
kernel:  zio_execute+0x8f/0x130 [zfs]
kernel:  taskq_thread+0x352/0x6f0 [spl]
kernel:  ? __pfx_default_wake_function+0x10/0x10
kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>

I/O hang 1:

kernel: INFO: task txg_sync:811 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:txg_sync        state:D stack:0     pid:811   tgid:811   ppid:2      flags:0x00004000
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  schedule_timeout+0x9e/0x160
kernel:  ? __pfx_process_timeout+0x10/0x10
kernel:  io_schedule_timeout+0x51/0x70
kernel:  __cv_timedwait_common+0x138/0x170 [spl]
kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
kernel:  __cv_timedwait_io+0x19/0x20 [spl]
kernel:  zio_wait+0x14e/0x2f0 [zfs]
kernel:  dsl_pool_sync+0xf2/0x510 [zfs]
kernel:  spa_sync+0x577/0x1070 [zfs]
kernel:  ? spa_txg_history_init_io+0x115/0x120 [zfs]
kernel:  txg_sync_thread+0x20a/0x3b0 [zfs]
kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
kernel:  thread_generic_wrapper+0x5a/0x70 [spl]
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>

I/O hang 2:

kernel: INFO: task z_wr_iss:970 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:z_wr_iss        state:D stack:0     pid:970   tgid:970   ppid:2      flags:0x00004000
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  vcmn_err.cold+0x69/0x7f [spl]
kernel:  zfs_panic_recover+0x79/0xa0 [zfs]
kernel:  zfs_blkptr_verify_log+0xba/0x190 [zfs]
kernel:  zfs_blkptr_verify+0x15a/0x5e0 [zfs]
kernel:  ? bp_get_dsize_sync+0x124/0x160 [zfs]
kernel:  dbuf_write_ready+0xf5/0x410 [zfs]
kernel:  arc_write_ready+0xe9/0x560 [zfs]
kernel:  ? mutex_lock+0x12/0x30
kernel:  zio_ready+0x4b/0x400 [zfs]
kernel:  zio_execute+0x8f/0x130 [zfs]
kernel:  taskq_thread+0x352/0x6f0 [spl]
kernel:  ? __pfx_default_wake_function+0x10/0x10
kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
kernel:  kthread+0xcf/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>

I/O hang 3:

kernel: INFO: task exe:2620 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:exe             state:D stack:0     pid:2620  tgid:2609  ppid:2489   flags:0x00000002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  ? arc_buf_alloc_impl.isra.0+0x28e/0x300 [zfs]
kernel:  schedule+0x27/0xf0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x3d0/0x6d0
kernel:  dbuf_dirty+0x4d/0x9b0 [zfs]
kernel:  dbuf_dirty+0x78b/0x9b0 [zfs]
kernel:  dnode_setdirty+0x96/0xf0 [zfs]
kernel:  dbuf_dirty+0x8b1/0x9b0 [zfs]
kernel:  sa_attr_op+0x27a/0x3d0 [zfs]
kernel:  sa_bulk_update_impl+0x62/0x100 [zfs]
kernel:  sa_bulk_update+0x50/0x90 [zfs]
kernel:  zfs_dirty_inode+0x2ab/0x3a0 [zfs]
kernel:  zpl_dirty_inode+0x2b/0x40 [zfs]
kernel:  __mark_inode_dirty+0x54/0x350
kernel:  generic_update_time+0x4e/0x60
kernel:  touch_atime+0xed/0x120
kernel:  zpl_iter_read+0x17b/0x190 [zfs]
kernel:  vfs_read+0x299/0x370
kernel:  __x64_sys_pread64+0x98/0xd0
kernel:  do_syscall_64+0x82/0x190
kernel:  ? eventfd_write+0xe2/0x210
kernel:  ? aa_file_perm+0x122/0x4d0
kernel:  ? cgroup_rstat_updated+0x69/0x220
kernel:  ? kmem_cache_alloc_noprof+0x106/0x2f0
kernel:  ? posix_lock_inode+0x516/0xa40
kernel:  ? fcntl_setlk+0x272/0x400
kernel:  ? __lruvec_stat_mod_folio+0x83/0xd0
kernel:  ? do_fcntl+0x5e9/0x740
kernel:  ? __x64_sys_fcntl+0x87/0xe0
kernel:  ? syscall_exit_to_user_mode+0x4d/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? __count_memcg_events+0x53/0xf0
kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
kernel:  ? handle_mm_fault+0x1bb/0x2c0
kernel:  ? do_user_addr_fault+0x36c/0x620
kernel:  ? exc_page_fault+0x7e/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x414cae
kernel: RSP: 002b:000000c000051750 EFLAGS: 00000212 ORIG_RAX: 0000000000000011
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000414cae
kernel: RDX: 0000000000000040 RSI: 000000c00031e680 RDI: 0000000000000003
kernel: RBP: 000000c000051790 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000212 R12: 000000c00031e680
kernel: R13: 0000000000000080 R14: 000000c000002380 R15: 0000000000000000
kernel:  </TASK>

I/O hang 4:

kernel: INFO: task fish:2909 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:fish            state:D stack:0     pid:2909  tgid:2909  ppid:1      flags:0x00000006
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x3d0/0x6d0
kernel:  dbuf_find+0xe1/0x250 [zfs]
kernel:  dbuf_hold_impl+0x6f/0x7e0 [zfs]
kernel:  ? dbuf_find+0x1b6/0x250 [zfs]
kernel:  dbuf_hold_impl+0x4d4/0x7e0 [zfs]
kernel:  dbuf_hold+0x31/0x60 [zfs]
kernel:  dnode_hold_impl+0x100/0x1310 [zfs]
kernel:  ? zfs_znode_hold_enter+0x118/0x170 [zfs]
kernel:  dmu_bonus_hold+0x3c/0x90 [zfs]
kernel:  zfs_zget+0x70/0x290 [zfs]
kernel:  zfs_dirent_lock+0x42b/0x6c0 [zfs]
kernel:  zfs_dirlook+0xb4/0x320 [zfs]
kernel:  ? zfs_zaccess+0x26f/0x450 [zfs]
kernel:  zfs_lookup+0x264/0x410 [zfs]
kernel:  zpl_lookup+0xd9/0x2d0 [zfs]
kernel:  lookup_one_qstr_excl+0x6f/0xa0
kernel:  filename_create+0xc6/0x1a0
kernel:  do_mkdirat+0x61/0x180
kernel:  __x64_sys_mkdir+0x46/0x70
kernel:  do_syscall_64+0x82/0x190
kernel:  ? syscall_exit_to_user_mode+0x4d/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? exc_page_fault+0x7e/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7f36923ac687
kernel: RSP: 002b:00007ffff6008618 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
kernel: RAX: ffffffffffffffda RBX: 00007ffff6008620 RCX: 00007f36923ac687
kernel: RDX: 0000000000000019 RSI: 00000000000001c0 RDI: 00007ffff6008620
kernel: RBP: 00007ffff60088e0 R08: fffefffefffcfcff R09: 632e2f6e6f6c6c65
kernel: R10: 8080808080808080 R11: 0000000000000246 R12: 000055a4a068ef70
kernel: R13: 00007ffff6008910 R14: 000055a4a67cd110 R15: 0000000000000019
kernel:  </TASK>

sbellon avatar May 07 '25 06:05 sbellon

I forgot to mention here that a zpool scrub (executed with 2.3.1) was completely clean:

  scan: scrub repaired 0B in 00:16:17 with 0 errors on Tue May  6 17:06:50 2025

Therefore follow-up questions:

  1. Is there any way to detect what parts of the file system are affected?
  2. Is there any way to repair it? I assume triggering a copy without CoW? Or restoring from a backup?
  3. In case, even system packages are affected, would some forced apt reinstall help to fix it?

sbellon avatar May 07 '25 06:05 sbellon

The panic was introduced in #17078. @asomers ^^^ FYI.

amotin avatar May 07 '25 13:05 amotin

@sbellon your dataset probably really is corrupted. But it would be good to be sure. This is the first I've heard of this type of corruption happening at any other site besides mine. Can you please try these two things?

  1. Read the entire file with dd, just to check that it's readable. Assuming your recsize is 128k: dd if=/path/to/file of=/dev/null bs=128k. I'm guessing that the read will fail.
  2. Dump the file's metadata with zdb in order to examine the corrupt block pointer: zdb -vvbbbb -O <DATASET> <FILE> > $HOME/<FILE>.zdb.txt. Then search through the output for the record with no DVAs. That line should include the string "zero".

If my hunch is correct, then the problem is that your dataset really is corrupt, and zdb panics when you try to write to another record that shares the same L1 block as the corrupt record. There are two solutions:

  1. Punch a hole where the corrupt record lies. On FreeBSD, you can do it like this: truncate -d -o <BYTE OFFSET> -l <BYTE LEN> /<MOUNTPOINT>/<FILE>. I'm not sure of the Linux equivalent. Of course, you could also just write zeros using dd. Ensure that the write is aligned to your recsize.
  2. Patch ZFS to allow writes of this type. When reading the L1 block from disk, we would have to validate all of the block pointers. If any fails, we would have to mark it in memory as "already corrupt". Then, when subsequently writing, we would allow the write of a corrupt block pointer if it was already corrupt when read from disk. I tried to implement such a patch, but was not successful. It's hard because blocks go straight from disk to ARC during read, without any kind of deserialization. So the validation would have to happen when reading from ARC. But I couldn't figure out how to determine whether the read from ARC was for a L1 block that had come straight from disk or had been cached by some write operation that hadn't flushed yet. In the latter case, we would want to panic during write.

asomers avatar May 07 '25 13:05 asomers

@asomers I have no idea what "file" that would be. Perhaps some more background what had happened and what I did afterwards:

After I noticed that my system behaved strangely with ZFS 2.3.2, the most noticeable effect was, that I was unable to open my default shell fish. As another user I was able to execute fish, so my suspicion was the configuration below ~/.config/fish.

Then I noticed that even ls -al ~/.config would lead to the blocking.

I rebooted into the old kernel with ZFS 2.3.1 and literally did the following:

mv ~/.config ~/.config_old
mkdir ~/.config
tar -C ~/.config_old -c --preserve-permissions . | tar -C ~/.config -x

Then I rebooted into the new kernel with ZFS 2.3.2 again, just to be shocked that still no fish shell opens and the stack traces are still in syslog.

I did strace fish and noticed that it hangs while trying to read from ~/.cache/fish, and indeed, ls -al ~/.cache also hangs.

Same procedure as above with ~/.config now also with ~/.cache ... and ... hurray ... I have a working system with the new kernel and ZFS 2.3.2 again.

What is now surprising is, that now with the new kernel and ZFS 2.3.2, I can even do ls -al ~/.config_old and ls -al ~/.cache_old` without the issue any more.

Perhaps I could dig into some snapshots to investigate further?

sbellon avatar May 07 '25 15:05 sbellon

The fact that ls -al can trigger the problem suggests that you have a corrupt directory, not a regular file. If plain ls works but ls -al does not, then that would suggest a corrupt inode. However, it should be impossible for inodes to become corrupted in a way that would produce the panic message you saw. The fact that everything works now is a sign that something caused the corrupt directory block to be overwritten. I suggest trying ls -al .config in all of your snapshots. And once you find the corrupt one, I suggest you try zdb as I suggested in the previous comment. It works on directories too.

Also, yay for fish! I use it too :-) .

asomers avatar May 07 '25 16:05 asomers

I'm not sure what to make of all this ... in an attempt to find a snapshot which affects the behaviour I did the following

cd /home/.zfs/snapshot
ls

which was just fine. Then I did

cd <TAB>

Which resulted in:

kernel: VERIFY(avl_find(tree, new_node, &where) == NULL) failed
kernel: PANIC at avl.c:625:avl_add()
kernel: Showing stack for process 128104
kernel: CPU: 14 UID: 1000 PID: 128104 Comm: fish Tainted: P           OE      6.12.25-amd64 #1  Debian 6.12.25-1
kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1205 06/14/2023
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x5d/0x80
kernel:  spl_panic+0xf4/0x10b [spl]
kernel:  ? __kmalloc_noprof+0x2c0/0x400
kernel:  avl_add+0x98/0xa0 [zfs]
kernel:  zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
kernel:  zpl_snapdir_automount+0x10/0x20 [zfs]
kernel:  __traverse_mounts+0x8c/0x210
kernel:  step_into+0x342/0x780
kernel:  path_openat+0x15a/0x12d0
kernel:  do_filp_open+0xc4/0x170
kernel:  do_sys_openat2+0xae/0xe0
kernel:  __x64_sys_openat+0x55/0xa0
kernel:  do_syscall_64+0x82/0x190
kernel:  ? atime_needs_update+0x61/0x110
kernel:  ? __rseq_handle_notify_resume+0xa2/0x4a0
kernel:  ? touch_atime+0x1e/0x120
kernel:  ? iterate_dir+0x182/0x200
kernel:  ? syscall_exit_to_user_mode+0x172/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? do_sys_openat2+0x9c/0xe0
kernel:  ? syscall_exit_to_user_mode+0x4d/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? exc_page_fault+0x7e/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7fbb4a145b7c
kernel: Code: 4c 89 54 24 18 41 89 f2 41 83 e2 40 75 40 89 f0 f7 d0 a9 00 00 41 00 74 35 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 77 44 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
kernel: RSP: 002b:00007fbb493f2c00 EFLAGS: 00000206 ORIG_RAX: 0000000000000101
kernel: RAX: ffffffffffffffda RBX: 00007fbb493f2d10 RCX: 00007fbb4a145b7c
kernel: RDX: 0000000000090800 RSI: 00007fbb40008710 RDI: 00000000ffffff9c
kernel: RBP: 0000000000000000 R08: 00007fbb40008740 R09: fff0fef0f0fef0f0
kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000037
kernel: R13: 00007fbb4a0e4240 R14: 00007fbb4000e5b0 R15: 00007fbb40008710
kernel:  </TASK>

and

kernel: INFO: task fish:128104 blocked for more than 120 seconds.
kernel:       Tainted: P           OE      6.12.25-amd64 #1 Debian 6.12.25-1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:fish            state:D stack:0     pid:128104 tgid:127230 ppid:127226 flags:0x00004002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x505/0xbf0
kernel:  schedule+0x27/0xf0
kernel:  spl_panic+0x109/0x10b [spl]
kernel:  ? __kmalloc_noprof+0x2c0/0x400
kernel:  avl_add+0x98/0xa0 [zfs]
kernel:  zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
kernel:  zpl_snapdir_automount+0x10/0x20 [zfs]
kernel:  __traverse_mounts+0x8c/0x210
kernel:  step_into+0x342/0x780
kernel:  path_openat+0x15a/0x12d0
kernel:  do_filp_open+0xc4/0x170
kernel:  do_sys_openat2+0xae/0xe0
kernel:  __x64_sys_openat+0x55/0xa0
kernel:  do_syscall_64+0x82/0x190
kernel:  ? atime_needs_update+0x61/0x110
kernel:  ? __rseq_handle_notify_resume+0xa2/0x4a0
kernel:  ? touch_atime+0x1e/0x120
kernel:  ? iterate_dir+0x182/0x200
kernel:  ? syscall_exit_to_user_mode+0x172/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? do_sys_openat2+0x9c/0xe0
kernel:  ? syscall_exit_to_user_mode+0x4d/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? exc_page_fault+0x7e/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7fbb4a145b7c
kernel: RSP: 002b:00007fbb493f2c00 EFLAGS: 00000206 ORIG_RAX: 0000000000000101
kernel: RAX: ffffffffffffffda RBX: 00007fbb493f2d10 RCX: 00007fbb4a145b7c
kernel: RDX: 0000000000090800 RSI: 00007fbb40008710 RDI: 00000000ffffff9c
kernel: RBP: 0000000000000000 R08: 00007fbb40008740 R09: fff0fef0f0fef0f0
kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000037
kernel: R13: 00007fbb4a0e4240 R14: 00007fbb4000e5b0 R15: 00007fbb40008710
kernel:  </TASK>

I rebooted again, and directly cd'ed into that folder and got:

kernel: VERIFY(avl_find(tree, new_node, &where) == NULL) failed
kernel: PANIC at avl.c:625:avl_add()
kernel: Showing stack for process 5186
kernel: CPU: 14 UID: 1000 PID: 5186 Comm: fish Tainted: P           OE      6.12.25-amd64 #1  Debian 6.12.25-1
kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
kernel: Hardware name: ASUS System Product Name/ROG STRIX B760-I GAMING WIFI, BIOS 1205 06/14/2023
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x5d/0x80
kernel:  spl_panic+0xf4/0x10b [spl]
kernel:  ? __kmalloc_node_noprof+0x1bf/0x410
kernel:  ? __kmalloc_noprof+0x2c0/0x400
kernel:  avl_add+0x98/0xa0 [zfs]
kernel:  zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
kernel:  zpl_snapdir_automount+0x10/0x20 [zfs]
kernel:  __traverse_mounts+0x8c/0x210
kernel:  step_into+0x342/0x780
kernel:  path_lookupat+0x6a/0x1a0
kernel:  filename_lookup+0xde/0x1d0
kernel:  vfs_statx+0x8d/0x100
kernel:  do_statx+0x63/0xa0
kernel:  __x64_sys_statx+0x98/0xe0
kernel:  do_syscall_64+0x82/0x190
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? do_statx+0x72/0xa0
kernel:  ? __rseq_handle_notify_resume+0xa2/0x4a0
kernel:  ? syscall_exit_to_user_mode+0x172/0x210
kernel:  ? do_syscall_64+0x8e/0x190
kernel:  ? __irq_exit_rcu+0x37/0xb0
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7f1af605c31a
kernel: Code: 48 8b 05 e1 2a 0e 00 be ff ff ff ff 64 c7 00 16 00 00 00 e9 53 fd ff ff e8 93 88 01 00 0f 1f 00 41 89 ca b8 4c 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2e 89 c1 85 c0 74 0f 48 8b 05 a9 2a 0e 00 64
kernel: RSP: 002b:00007f1af5ac4178 EFLAGS: 00000202 ORIG_RAX: 000000000000014c
kernel: RAX: ffffffffffffffda RBX: 00007f1af5ac4420 RCX: 00007f1af605c31a
kernel: RDX: 0000000000000000 RSI: 00007f1af5ac42a0 RDI: 00000000ffffff9c
kernel: RBP: 00007f1af5ac4290 R08: 00007f1af5ac4180 R09: 31303a30303a3030
kernel: R10: 0000000000000fff R11: 0000000000000202 R12: 000055fa2c204f40
kernel: R13: 0000000000000023 R14: 0000000000000001 R15: 00007f1af5ac42a0
kernel:  </TASK>

But nothing since. I can perfectly do ls -al on all snapshots I try, and I can now even go to /home/.zfs/snapshot and do cd <TAB> there.

Does this make any sense?

sbellon avatar May 07 '25 19:05 sbellon

@sbellon I'm not sure if the avl_find panic is related or not. But now that you can list snapshots, can you check the .config directory within them? Or, if you can't do it without triggering the avl_find panic again, you can try the zdb command. The zdb command I gave you does not require the snapshot to be mounted.

asomers avatar May 07 '25 20:05 asomers

I'm unsure of how this zdb is intended to work:

$ zfs list zroot/home
NAME                USED  AVAIL  REFER  MOUNTPOINT
zroot/home          859G   799G   676G  /home
$ sudo zdb -vvbbbb -O zroot/home ".zfs/snapshot/autosnap_2025-05-07_00:00:01_daily/user/.config" > config.zdb.txt
failed to hold dataset 'zroot/home': No such device or address

When mounted, that snapshot is at /home/.zfs/snapshot/autosnap_2025-05-07_00:00:01_daily.

sbellon avatar May 07 '25 21:05 sbellon

@sbellon the <FILE> should be a path relative to the dataset's mountpoint. Each each snapshot is technically a different dataset. So in this case, you should do:

$ sudo zdb -vvbbbb -O zroot/home@autosnap_2025-05-07_00:00:01_daily user/.config > config.zdb.txt

Also, that command relies on your zpool.cache file being at the default location. If it isn't, you'll need to add a -U argument.

asomers avatar May 08 '25 21:05 asomers

Sorry ...

$ zfs list -t snapshot | grep "zroot/home@autosnap_2025-05-07_00:00:01_daily"
zroot/home@autosnap_2025-05-07_00:00:01_daily                   500M      -   637G  -
$ sudo zdb -vvbbbb -O "zroot/home@autosnap_2025-05-07_00:00:01_daily" user/.config > config.zdb.txt
failed to hold dataset 'zroot/home@autosnap_2025-05-07_00:00:01_daily': No such device or address

sbellon avatar May 09 '25 05:05 sbellon

That error is ENXIO. Maybe zdb isn't working due to the corruption. Could you try running it on a file that you know to be perfectly accessible?

asomers avatar May 09 '25 12:05 asomers

I rather suspect that zdb -O excepts the argument in a different format. I cannot get that to work at all. Regardless of whatever I feed it, I get failed to hold dataset 'xyz': No such device or address

sbellon avatar May 09 '25 14:05 sbellon

You could try using the -U argument if your zpool.cache file is in a non-default location. I can't reproduce the ENXIO errno that way, but maybe the error code is different on Linux.

asomers avatar May 09 '25 14:05 asomers

I haven't done anything special to my pool. In fact I don't even know where my zpool.cache would be located, so I'm assuming everything is at default.

sbellon avatar May 09 '25 14:05 sbellon

I haven't done anything special to my pool. In fact I don't even know where my zpool.cache would be located, so I'm assuming everything is at default.

Are you booting from ZFS? If so, your zpool.cache file is likely not in the default location. Look somewhere in /boot.

asomers avatar May 09 '25 14:05 asomers

I have pretty much this setup: https://docs.zfsbootmenu.org/en/latest/guides/debian/bookworm-uefi.html (using the unencrypted NVME path of that guide as well as doubling everything for two NVME and doing a mirror).

sbellon avatar May 09 '25 15:05 sbellon

I found /etc/zfs/zpool.cache but even specifying that with -U /etc/zfs/zpool.cache does not change zdb behaviour.

sbellon avatar May 09 '25 16:05 sbellon

PR #17418 fixes the root cause of this issue. That is, it prevents the corruption in the first place. But it still doesn't help to repair a corrupt block.

asomers avatar Jun 03 '25 21:06 asomers

I see this issue is still open. Can I hope that there will eventually be a fix to prevent a crash in this case? It's one thing to have a corrupt file, but another to have the system crash.

clhedrick avatar Jun 30 '25 12:06 clhedrick

Same issue here, I detailed my setup in this comment: https://github.com/openzfs/zfs/issues/17659#issuecomment-3464058001

Any workaround? I can't read snapshots with backup tools anymore as-is. I also see the same weird behavior where I can ls in snapshots, but tab-completion of paths in there causes panics.

yourfate avatar Oct 29 '25 21:10 yourfate