ZFS kernel panic when mounting a snapshot using systemd
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Debian |
| Distribution Version | 13 |
| Kernel Version | 6.12.41+deb13 |
| Architecture | amd64 |
| OpenZFS Version | 2.3.2-2 |
Describe the problem you're observing
I have a bunch of systemd units, the first will create a ZFS snapshot, others will mount the snapshot (via .zfs/snapshot-name) in a private mount namespace.
The setup worked OK until I upgraded from Debian 12 to 13.
The panic error looks similar to this one
Describe how to reproduce the problem
Run commands (as root)
cat > /etc/systemd/system/zfs-test-a.service <<EOF
[Unit]
StopWhenUnneeded=yes
[Service]
RemainAfterExit=yes
Type=oneshot
ExecStart=zfs snapshot -r test-pool/a@test-snapshot
ExecStop=zfs destroy -r test-pool/a@test-snapshot
EOF
cat > /etc/systemd/system/[email protected] <<EOF
[Unit]
Requires=zfs-test-a.service
After=zfs-test-a.service
[Service]
Type=oneshot
ExecStart=echo start %i
ExecStart=ls -lh /test-pool
ExecStop=echo stop %i
TemporaryFileSystem=/test-pool
BindReadOnlyPaths=/test-pool/a/.zfs/snapshot/test-snapshot:/test-pool/a
BindReadOnlyPaths=/test-pool/a/b/.zfs/snapshot/test-snapshot:/test-pool/a/b
EOF
cat > /etc/systemd/system/zfs-test-c.service <<EOF
[Unit]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[Service]
Type=oneshot
ExecStart=echo start
ExecStart=echo stop
EOF
zpool create -m /test-pool test-pool
zfs create test-pool/a
zfs create test-pool/a/b
systemctl daemon-reload
systemctl start zfs-test-c.service
Include any warning/errors/backtraces from the system logs
####### Immediately shows up
VERIFY(avl_find(tree, new_node, &where) == NULL) failed
PANIC at avl.c:625:avl_add()
Showing stack for process 6866
CPU: 7 UID: 0 PID: 6866 Comm: (echo) Tainted: P OE 6.12.41+deb13-amd64 #1 Debian 6.12.41-1
Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./E3C226D2I, BIOS P3.50 04/13/2018
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
spl_panic+0xf4/0x10b [spl]
? __kmalloc_noprof+0x17e/0x400
avl_add+0x98/0xa0 [zfs]
zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
zpl_snapdir_automount+0x10/0x20 [zfs]
__traverse_mounts+0x8f/0x210
step_into+0x342/0x780
link_path_walk.part.0.constprop.0+0x21e/0x390
path_lookupat+0x3e/0x1a0
path_openat+0x93f/0x12d0
do_filp_open+0xc4/0x170
do_sys_openat2+0xae/0xe0
__x64_sys_openat+0x55/0xa0
do_syscall_64+0x82/0x190
? __x64_sys_close+0x3c/0x80
? kmem_cache_free+0x3ee/0x440
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? kmem_cache_free+0x3ee/0x440
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? do_syscall_64+0x8e/0x190
? __irq_exit_rcu+0x37/0xb0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f0829299687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP: 002b:00007ffcbe24beb0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f0829c59e00 RCX: 00007f0829299687
RDX: 00000000002a0000 RSI: 0000560427851470 RDI: ffffffffffffff9c
RBP: 000056042785c6c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000005000 R14: 0000000000000000 R15: 0000000000000007
</TASK>
####### Shows up after 2~3 minutes
INFO: task (echo):6864 blocked for more than 120 seconds.
Tainted: P OE 6.12.41+deb13-amd64 #1 Debian 6.12.41-1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:(echo) state:D stack:0 pid:6864 tgid:6864 ppid:1 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x505/0xc00
schedule+0x27/0xf0
schedule_preempt_disabled+0x15/0x30
rwsem_down_write_slowpath+0x1d9/0x610
? call_usermodehelper_exec+0xed/0x1a0
down_write+0x5a/0x60
zfsctl_snapshot_mount+0x789/0x9b0 [zfs]
zpl_snapdir_automount+0x10/0x20 [zfs]
__traverse_mounts+0x8f/0x210
step_into+0x342/0x780
? __pfx_zpl_snapdir_revalidate+0x10/0x10 [zfs]
path_lookupat+0x6a/0x1a0
filename_lookup+0xde/0x1d0
kern_path+0x2f/0x50
path_mount+0x5dc/0xb00
__x64_sys_mount+0x119/0x150
do_syscall_64+0x82/0x190
? do_filp_open+0xc4/0x170
? do_sys_openat2+0x9c/0xe0
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? _copy_to_user+0x36/0x50
? cp_new_stat+0x131/0x170
? __memcg_slab_free_hook+0xf7/0x140
? __x64_sys_close+0x3c/0x80
? __memcg_slab_free_hook+0xf7/0x140
? __x64_sys_close+0x3c/0x80
? kmem_cache_free+0x3ee/0x440
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? do_syscall_64+0x8e/0x190
? do_syscall_64+0x8e/0x190
? __irq_exit_rcu+0x37/0xb0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f13d811b68a
RSP: 002b:00007ffefb075a88 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f13d811b68a
RDX: 0000000000000000 RSI: 00007ffefb075a90 RDI: 000056364d0986c0
RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000005000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000005000 R14: 0000000000000000 R15: 0000000000000007
</TASK>
INFO: task (echo):6865 blocked for more than 120 seconds.
Tainted: P OE 6.12.41+deb13-amd64 #1 Debian 6.12.41-1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:(echo) state:D stack:0 pid:6865 tgid:6865 ppid:1 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x505/0xc00
schedule+0x27/0xf0
schedule_preempt_disabled+0x15/0x30
rwsem_down_read_slowpath+0x25c/0x480
down_read+0x48/0xa0
zfsctl_snapshot_mount+0x3f0/0x9b0 [zfs]
? vsnprintf+0x4e5/0x6d0
? security_capable+0x8f/0x180
zpl_snapdir_automount+0x10/0x20 [zfs]
__traverse_mounts+0x8f/0x210
step_into+0x342/0x780
? __pfx_zpl_snapdir_revalidate+0x10/0x10 [zfs]
path_lookupat+0x6a/0x1a0
filename_lookup+0xde/0x1d0
kern_path+0x2f/0x50
path_mount+0x5dc/0xb00
__x64_sys_mount+0x119/0x150
do_syscall_64+0x82/0x190
? do_sys_openat2+0x9c/0xe0
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? __memcg_slab_free_hook+0xf7/0x140
? __x64_sys_close+0x3c/0x80
? kmem_cache_free+0x3ee/0x440
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? do_syscall_64+0x8e/0x190
? __irq_exit_rcu+0x37/0xb0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fe42771b68a
RSP: 002b:00007ffd11f0f1b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe42771b68a
RDX: 0000000000000000 RSI: 00007ffd11f0f1c0 RDI: 00005625194222c0
RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000005000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000005000 R14: 0000000000000000 R15: 0000000000000007
</TASK>
INFO: task (echo):6866 blocked for more than 120 seconds.
Tainted: P OE 6.12.41+deb13-amd64 #1 Debian 6.12.41-1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:(echo) state:D stack:0 pid:6866 tgid:6866 ppid:1 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x505/0xc00
schedule+0x27/0xf0
spl_panic+0x109/0x10b [spl]
? __kmalloc_noprof+0x17e/0x400
avl_add+0x98/0xa0 [zfs]
zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
zpl_snapdir_automount+0x10/0x20 [zfs]
__traverse_mounts+0x8f/0x210
step_into+0x342/0x780
link_path_walk.part.0.constprop.0+0x21e/0x390
path_lookupat+0x3e/0x1a0
path_openat+0x93f/0x12d0
do_filp_open+0xc4/0x170
do_sys_openat2+0xae/0xe0
__x64_sys_openat+0x55/0xa0
do_syscall_64+0x82/0x190
? __x64_sys_close+0x3c/0x80
? kmem_cache_free+0x3ee/0x440
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? kmem_cache_free+0x3ee/0x440
? syscall_exit_to_user_mode+0x4d/0x210
? do_syscall_64+0x8e/0x190
? do_syscall_64+0x8e/0x190
? __irq_exit_rcu+0x37/0xb0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f0829299687
RSP: 002b:00007ffcbe24beb0 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 00007f0829c59e00 RCX: 00007f0829299687
RDX: 00000000002a0000 RSI: 0000560427851470 RDI: ffffffffffffff9c
RBP: 000056042785c6c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000005000 R14: 0000000000000000 R15: 0000000000000007
</TASK>
A workaround is to use mount -t zfs test-pool/a@test-snapshot /mnt instead of .zfs/snapshot
Potentially a duplicate of this issue, especially I got the same symptoms as this comment.
The issue here is that the .zfs/snapshot automount codes doesn't handle the case where one snapshot wants to be automount in to two different filesystem namespaces at the same time. It's not namespace aware, so it ends up triggering the assert Making it fail gracefully is straight forward, but it would probably be best to make it aware of the namespaces and handle this case. That looks like it'll be a bit tricky.
The issue here is that the
.zfs/snapshotautomount codes doesn't handle the case where one snapshot wants to be automount in to two different filesystem namespaces at the same time. It's not namespace aware, so it ends up triggering the assert Making it fail gracefully is straight forward, but it would probably be best to make it aware of the namespaces and handle this case. That looks like it'll be a bit tricky.
Thanks for the explanation! I didn't have any issues when using ZFS 2.2.? in Debian 12, I wonder if that's expected.
I get the same panic signature on Proxmox VE 9 (Debian 13 base).
Environment:
- Proxmox VE 9.0.10
- Kernel: 6.14.11-4-pve (also 6.14.11-3-pve)
- OpenZFS: 2.3.4-pve1
- CPU: AMD Ryzen
- snapdir=visible on affected datasets
- ~57 snapshots (zrepl sink)
Trigger:
- Any stat/list of .zfs/snapshot (e.g.,
ls,find,httm,statx) causes panic.
Kernel log:
VERIFY(avl_find(tree, new_node, &where) == NULL) failed
PANIC at avl.c:625: avl_add()
Call trace includes:
spl_panic [spl]
avl_add [zfs]
zfsctl_snapshot_mount [zfs]
zpl_snapdir_automount [zfs]
__traverse_mounts → step_into → link_path_walk → statx/openat
Notes:
- ~Setting
snapdir=hiddenprevents the panic.~ - Normal operations ok; only .zfs/snapshot access triggers.
- Looks similar to this issue’s backtrace and to #17337.
A workaround is to use
mount -t zfs test-pool/a@test-snapshot /mntinstead of.zfs/snapshot
Not for me. I have a similar problem, when I do a lot of i/o on the .zfs/snapshot directories I get a kernel panic after a while. My backup solution is to backup the latest autosnap using restic backup, which is now broken.
I have tried the method of mounting the snapshot somewhere else, but I ran into the same issue.
My mirror completely corrupted because of hung zfs processes related to this yesterday. Can't even test further workarounds as the restore takes 10+ hours and I can't afford it going offline again.
* Setting `snapdir=hidden` prevents the panic.
Not true for me. That is set, however, I still get panics after a while if I do IO on the .zfs/snapshot/<> directories.
Also, mounting a snapshot somewhere else didn't work.
Is there any workaround? This prevents my backup strategy from working (restic backup reading snapshots), and downgrading is a mess in my setup (zfs in kernel, root on zfs).
I'm on proxmox 9.0.11, Kernel Linux pve 6.14.11-4-pve,
zfs --version
zfs-2.3.4-pve1
zfs-kmod-2.3.4-pve1
I'm going to try and get a the steps to reproduce nailed down to help the developers. @yourfate your linked ticket - does a downgrade work as a workaround?
does a downgrade work as a workaround?
It would probably work, but it's a big hassle to do: the distro has zfs in the kernel. I'd have to build a kernel with an older ZFS version, or use the DKMS version. Then I'd have to downgrade and pin the userspace tools. If I use the DKMS version I then have to also fix the boot from the zfs root. I'm not a big expert on this, and the risk of the whole thing not booting once I start messing with this is definitely present.
I can see similar kernel logs. It is somehow related to restic doing backups from automounted snapshots in .zfs. But i do not understand what is happening here. At least i'm very concerned and wonder if it would be better to drop this zfs thing. Sometimes the kernel completely freezes. Even Magic SysRq Key does not work.
Okt 15 00:32:52 pluto systemd[1]: home-klaus-.zfs-snapshot-backup.mount: Deactivated successfully.
Okt 15 00:32:52 pluto kernel: EXT4-fs (sda1): mounted filesystem 8102057f-d02e-4025-b3c0-c6a448d432c4 r/w with ordered data mode. Quota mode: none.
Okt 15 00:32:53 pluto kernel: VERIFY(avl_find(tree, new_node, &where) == NULL) failed
Okt 15 00:32:53 pluto kernel: PANIC at avl.c:625:avl_add()
Okt 15 00:32:53 pluto kernel: Showing stack for process 412685
Okt 15 00:32:53 pluto kernel: CPU: 3 UID: 1000 PID: 412685 Comm: restic_0.18.1_l Tainted: P OE 6.12.48+deb13-amd64 #1 Debian 6.12.48-1
Okt 15 00:32:53 pluto kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Okt 15 00:32:53 pluto kernel: Hardware name: HP HP Z2 Tower G5 Workstation/8750, BIOS S50 Ver. 01.04.18 10/04/2024
Okt 15 00:32:53 pluto kernel: Call Trace:
Okt 15 00:32:53 pluto kernel: <TASK>
Okt 15 00:32:53 pluto kernel: dump_stack_lvl+0x5d/0x80
Okt 15 00:32:53 pluto kernel: spl_panic+0xf4/0x10b [spl]
Okt 15 00:32:53 pluto kernel: ? __kmalloc_noprof+0x17e/0x400
Okt 15 00:32:53 pluto kernel: avl_add+0x98/0xa0 [zfs]
Okt 15 00:32:53 pluto kernel: zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
Okt 15 00:32:53 pluto kernel: zpl_snapdir_automount+0x10/0x20 [zfs]
Okt 15 00:32:53 pluto kernel: __traverse_mounts+0x8c/0x210
Okt 15 00:32:53 pluto kernel: step_into+0x342/0x780
Okt 15 00:32:53 pluto kernel: path_openat+0x15a/0x12d0
Okt 15 00:32:53 pluto kernel: ? __memcg_slab_free_hook+0xf7/0x140
Okt 15 00:32:53 pluto kernel: do_filp_open+0xc4/0x170
Okt 15 00:32:53 pluto kernel: do_sys_openat2+0xae/0xe0
Okt 15 00:32:53 pluto kernel: ? do_syscall_64+0x8e/0x190
Okt 15 00:32:53 pluto kernel: __x64_sys_openat+0x55/0xa0
Okt 15 00:32:53 pluto kernel: do_syscall_64+0x82/0x190
Okt 15 00:32:53 pluto kernel: ? do_syscall_64+0x8e/0x190
Okt 15 00:32:53 pluto kernel: ? clear_bhb_loop+0x40/0x90
Okt 15 00:32:53 pluto kernel: ? clear_bhb_loop+0x40/0x90
Okt 15 00:32:53 pluto kernel: ? clear_bhb_loop+0x40/0x90
Okt 15 00:32:53 pluto kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Okt 15 00:32:53 pluto kernel: RIP: 0033:0x40758e
Okt 15 00:32:53 pluto kernel: Code: ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89>
Okt 15 00:32:53 pluto kernel: RSP: 002b:000000c00050eb40 EFLAGS: 00000216 ORIG_RAX: 0000000000000101
Okt 15 00:32:53 pluto kernel: RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 000000000040758e
Okt 15 00:32:53 pluto kernel: RDX: 00000000000a0000 RSI: 000000c00028c390 RDI: ffffffffffffff9c
Okt 15 00:32:53 pluto kernel: RBP: 000000c00050eb80 R08: 0000000000000000 R09: 0000000000000000
Okt 15 00:32:53 pluto kernel: R10: 0000000000000000 R11: 0000000000000216 R12: 000000c00028c390
Okt 15 00:32:53 pluto kernel: R13: 00000000000000aa R14: 000000c00030da40 R15: ffffffffffffffff
Okt 15 00:32:53 pluto kernel: </TASK>
Okt 15 00:37:31 pluto kernel: INFO: task restic_0.18.1_l:412685 blocked for more than 241 seconds.
Okt 15 00:37:31 pluto kernel: Tainted: P OE 6.12.48+deb13-amd64 #1 Debian 6.12.48-1
Okt 15 00:37:31 pluto kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Okt 15 00:37:31 pluto kernel: task:restic_0.18.1_l state:D stack:0 pid:412685 tgid:412669 ppid:412658 flags:0x00004004
Okt 15 00:37:31 pluto kernel: Call Trace:
Okt 15 00:37:31 pluto kernel: <TASK>
Okt 15 00:37:31 pluto kernel: __schedule+0x505/0xc00
Okt 15 00:37:31 pluto kernel: schedule+0x27/0xf0
Okt 15 00:37:31 pluto kernel: spl_panic+0x109/0x10b [spl]
Okt 15 00:37:31 pluto kernel: ? __kmalloc_noprof+0x17e/0x400
Okt 15 00:37:31 pluto kernel: avl_add+0x98/0xa0 [zfs]
Okt 15 00:37:31 pluto kernel: zfsctl_snapshot_mount+0x85d/0x9b0 [zfs]
Okt 15 00:37:31 pluto kernel: zpl_snapdir_automount+0x10/0x20 [zfs]
Okt 15 00:37:31 pluto kernel: __traverse_mounts+0x8c/0x210
Okt 15 00:37:31 pluto kernel: step_into+0x342/0x780
Okt 15 00:37:31 pluto kernel: path_openat+0x15a/0x12d0
Okt 15 00:37:31 pluto kernel: ? __memcg_slab_free_hook+0xf7/0x140
Okt 15 00:37:31 pluto kernel: do_filp_open+0xc4/0x170
Okt 15 00:37:31 pluto kernel: do_sys_openat2+0xae/0xe0
Okt 15 00:37:31 pluto kernel: ? do_syscall_64+0x8e/0x190
Okt 15 00:37:31 pluto kernel: __x64_sys_openat+0x55/0xa0
Okt 15 00:37:31 pluto kernel: do_syscall_64+0x82/0x190
Okt 15 00:37:31 pluto kernel: ? do_syscall_64+0x8e/0x190
Okt 15 00:37:31 pluto kernel: ? clear_bhb_loop+0x40/0x90
Okt 15 00:37:31 pluto kernel: ? clear_bhb_loop+0x40/0x90
Okt 15 00:37:31 pluto kernel: ? clear_bhb_loop+0x40/0x90
Okt 15 00:37:31 pluto kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Okt 15 00:37:31 pluto kernel: RIP: 0033:0x40758e
Okt 15 00:37:31 pluto kernel: RSP: 002b:000000c00050eb40 EFLAGS: 00000216 ORIG_RAX: 0000000000000101
Okt 15 00:37:31 pluto kernel: RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 000000000040758e
Okt 15 00:37:31 pluto kernel: RDX: 00000000000a0000 RSI: 000000c00028c390 RDI: ffffffffffffff9c
Okt 15 00:37:31 pluto kernel: RBP: 000000c00050eb80 R08: 0000000000000000 R09: 0000000000000000
Okt 15 00:37:31 pluto kernel: R10: 0000000000000000 R11: 0000000000000216 R12: 000000c00028c390
Okt 15 00:37:31 pluto kernel: R13: 00000000000000aa R14: 000000c00030da40 R15: ffffffffffffffff
Okt 15 00:37:31 pluto kernel: </TASK>
zfs --version zfs-2.3.2-2 zfs-kmod-2.3.2-2
I'm using the latest debian Version 13.1
Just sharing my "me too" here - Upgraded to Proxmox 9 and my restic backup .../.zfs/snapshot/$name script triggers the same panic, the process hangs and any future zfs operations also seem to hang. What's interesting to me is that it's triggered reliably by one particular dataset, which happens to be the second ZFS dataset it backs up.
- The first dataset happens to be mounted at
/(backing up/.zfs/snapshot/$name) and contains the usual/etcand/rootand/binetc, no issues and I can runresticagainst it several times (at least) before anything breaks. - The second dataset, the one that reliably causes a panic fairly quickly, doesn't seem particularly problematic - 55 files and directories totaling 56MB, exclusively files and directories (no links, sockets, etc), just the basic configuration data for an application that runs in a Docker container. Just for good measure, I
zfs destroyed that dataset and recreated itzfs create ..., restoring the original data, and the panic happens still.
Ultimately, a hard reset is the only means to recover the system. reboot just ends up sitting/hanging trying to unmount or kill processes.
System is fully up-to-date, running Proxmox's kernel. pve-manager/9.1.1/42db4a6cf33dac83 (running kernel: 6.17.2-2-pve) zfs-2.3.4-pve1 zfs-kmod-2.3.4-pve1
Dec 03 15:43:34 mordor kernel: VERIFY(avl_find(tree, new_node, &where) == NULL) failed
Dec 03 15:43:34 mordor kernel: PANIC at avl.c:625:avl_add()
Dec 03 15:43:34 mordor kernel: Showing stack for process 113386
Dec 03 15:43:34 mordor kernel: CPU: 3 UID: 0 PID: 113386 Comm: restic Tainted: P O 6.17.2-2-pve #1 PREEMPT(voluntary)
Dec 03 15:43:34 mordor kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Dec 03 15:43:34 mordor kernel: Hardware name: /NUC5i5RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
Dec 03 15:43:34 mordor kernel: Call Trace:
Dec 03 15:43:34 mordor kernel: <TASK>
Dec 03 15:43:34 mordor kernel: dump_stack_lvl+0x5f/0x90
Dec 03 15:43:34 mordor kernel: dump_stack+0x10/0x18
Dec 03 15:43:34 mordor kernel: spl_dumpstack+0x28/0x40 [spl]
Dec 03 15:43:34 mordor kernel: spl_panic+0xef/0x114 [spl]
Dec 03 15:43:34 mordor kernel: avl_add+0xad/0xc0 [zfs]
Dec 03 15:43:34 mordor kernel: zfsctl_snapshot_mount+0x979/0xa10 [zfs]
Dec 03 15:43:34 mordor kernel: zpl_snapdir_automount+0x10/0x20 [zfs]
Dec 03 15:43:34 mordor kernel: __traverse_mounts+0x92/0x220
Dec 03 15:43:34 mordor kernel: step_into+0x2e2/0x780
Dec 03 15:43:34 mordor kernel: path_openat+0x141/0x1240
Dec 03 15:43:34 mordor kernel: ? __x64_sys_close+0x3e/0x90
Dec 03 15:43:34 mordor kernel: ? generic_file_llseek+0x21/0x40
Dec 03 15:43:34 mordor kernel: do_filp_open+0xd3/0x190
Dec 03 15:43:34 mordor kernel: do_sys_openat2+0x8b/0xf0
Dec 03 15:43:34 mordor kernel: __x64_sys_openat+0x52/0xa0
Dec 03 15:43:34 mordor kernel: x64_sys_call+0x1bf2/0x2330
Dec 03 15:43:34 mordor kernel: do_syscall_64+0x80/0xa30
Dec 03 15:43:34 mordor kernel: ? irqentry_exit+0x43/0x50
Dec 03 15:43:34 mordor kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Dec 03 15:43:34 mordor kernel: RIP: 0033:0x40f16e
Dec 03 15:43:34 mordor kernel: Code: 24 28 44 8b 44 24 2c e9 70 ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
Dec 03 15:43:34 mordor kernel: RSP: 002b:000000c0004487c0 EFLAGS: 00000212 ORIG_RAX: 0000000000000101
Dec 03 15:43:34 mordor kernel: RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 000000000040f16e
Dec 03 15:43:34 mordor kernel: RDX: 00000000000a0000 RSI: 000000c00003fd00 RDI: ffffffffffffff9c
Dec 03 15:43:34 mordor kernel: RBP: 000000c000448800 R08: 0000000000000000 R09: 0000000000000000
Dec 03 15:43:34 mordor kernel: R10: 0000000000000000 R11: 0000000000000212 R12: 000000c00003fd00
Dec 03 15:43:34 mordor kernel: R13: 0000000000000080 R14: 000000c000018e00 R15: 000000c0036e6e70
Dec 03 15:43:34 mordor kernel: </TASK>
Dec 03 15:46:29 mordor kernel: INFO: task restic:113386 blocked for more than 122 seconds.
Dec 03 15:46:29 mordor kernel: Tainted: P O 6.17.2-2-pve #1
Dec 03 15:46:29 mordor kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 03 15:46:29 mordor kernel: task:restic state:D stack:0 pid:113386 tgid:113378 ppid:104724 task_flags:0x400040 flags:0x00004002
Dec 03 15:46:29 mordor kernel: Call Trace:
Dec 03 15:46:29 mordor kernel: <TASK>
Dec 03 15:46:29 mordor kernel: __schedule+0x468/0x1310
Dec 03 15:46:29 mordor kernel: schedule+0x27/0xf0
Dec 03 15:46:29 mordor kernel: spl_panic+0x112/0x114 [spl]
Dec 03 15:46:29 mordor kernel: avl_add+0xad/0xc0 [zfs]
Dec 03 15:46:29 mordor kernel: zfsctl_snapshot_mount+0x979/0xa10 [zfs]
Dec 03 15:46:29 mordor kernel: zpl_snapdir_automount+0x10/0x20 [zfs]
Dec 03 15:46:29 mordor kernel: __traverse_mounts+0x92/0x220
Dec 03 15:46:29 mordor kernel: step_into+0x2e2/0x780
Dec 03 15:46:29 mordor kernel: path_openat+0x141/0x1240
Dec 03 15:46:29 mordor kernel: ? __x64_sys_close+0x3e/0x90
Dec 03 15:46:29 mordor kernel: ? generic_file_llseek+0x21/0x40
Dec 03 15:46:29 mordor kernel: do_filp_open+0xd3/0x190
Dec 03 15:46:29 mordor kernel: do_sys_openat2+0x8b/0xf0
Dec 03 15:46:29 mordor kernel: __x64_sys_openat+0x52/0xa0
Dec 03 15:46:29 mordor kernel: x64_sys_call+0x1bf2/0x2330
Dec 03 15:46:29 mordor kernel: do_syscall_64+0x80/0xa30
Dec 03 15:46:29 mordor kernel: ? irqentry_exit+0x43/0x50
Dec 03 15:46:29 mordor kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Dec 03 15:46:29 mordor kernel: RIP: 0033:0x40f16e
Dec 03 15:46:29 mordor kernel: RSP: 002b:000000c0004487c0 EFLAGS: 00000212 ORIG_RAX: 0000000000000101
Dec 03 15:46:29 mordor kernel: RAX: ffffffffffffffda RBX: ffffffffffffff9c RCX: 000000000040f16e
Dec 03 15:46:29 mordor kernel: RDX: 00000000000a0000 RSI: 000000c00003fd00 RDI: ffffffffffffff9c
Dec 03 15:46:29 mordor kernel: RBP: 000000c000448800 R08: 0000000000000000 R09: 0000000000000000
Dec 03 15:46:29 mordor kernel: R10: 0000000000000000 R11: 0000000000000212 R12: 000000c00003fd00
Dec 03 15:46:29 mordor kernel: R13: 0000000000000080 R14: 000000c000018e00 R15: 000000c0036e6e70
Dec 03 15:46:29 mordor kernel: </TASK>
openzfs 2.3.5. Caused by https://github.com/kimono-koans/httm/releases/tag/0.49.9 (ran as unprivileged user):
$ httm --preview --browse . # . - is a current dir, present in snapshots
dmesg panic
[1220118.232234] VERIFY(avl_find(tree, new_node, &where) == NULL) failed
[1220118.232242] PANIC at avl.c:625:avl_add()
[1220118.232246] Showing stack for process 12468
[1220118.232247] CPU: 2 PID: 12468 Comm: httm Tainted: P OE T 6.1.158-gentoo-x86_64 #1
[1220118.232251] Call Trace:
[1220118.232253] <TASK>
[1220118.232254] dump_stack_lvl+0x38/0x4a
[1220118.232260] spl_panic+0xf0/0x229 [spl]
[1220118.232268] avl_add+0x8f/0xa0 [zfs]
[1220118.232299] zfsctl_snapshot_mount+0x876/0x9d0 [zfs]
[1220118.232336] zio_do_crypt_abd+0x487/0x33b0 [zfs]
[1220118.232367] __traverse_mounts+0xba/0x2b0
[1220118.232371] step_into+0x341/0x790
[1220118.232373] ? lookup_fast+0x51/0xf0
[1220118.232375] link_path_walk.part.0.constprop.0+0x241/0x3b0
[1220118.232378] path_lookupat+0x39/0x1a0
[1220118.232381] filename_lookup+0xe5/0x1f0
[1220118.232384] vfs_statx+0x9e/0x170
[1220118.232387] do_statx+0x59/0xa0
[1220118.232390] __x64_sys_statx+0x5d/0x80
[1220118.232393] do_syscall_64+0x56/0x80
[1220118.232399] ? clear_bhb_loop+0x30/0x7e
[1220118.232403] ? clear_bhb_loop+0x30/0x7e
[1220118.232406] ? clear_bhb_loop+0x30/0x7e
[1220118.232409] ? clear_bhb_loop+0x30/0x7e
[1220118.232413] ? clear_bhb_loop+0x30/0x7e
[1220118.232416] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[1220118.232421] RIP: 0033:0x77f00863d58e
[1220118.232425] Code: c8 0d 00 be ff ff ff ff 64 c7 00 16 00 00 00 e9 44 fd ff ff e8 93 8f 01 00 0f 1f 00 f3 0f 1e fa 41 89 ca b8 4c 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 22 85 c0 89 c1 74 0f 48 8b 05 3d c8 0d 00 64
[1220118.232428] RSP: 002b:000077f001aea1c8 EFLAGS: 00000202 ORIG_RAX: 000000000000014c
[1220118.232433] RAX: ffffffffffffffda RBX: 000077f001aea480 RCX: 000077f00863d58e
[1220118.232436] RDX: 0000000000000100 RSI: 000077f001aea300 RDI: 00000000ffffff9c
[1220118.232437] RBP: 000077f001aea2f0 R08: 000077f001aea1d0 R09: 000077f001aea340
[1220118.232439] R10: 0000000000000fff R11: 0000000000000202 R12: 000077f001aea6f0
[1220118.232440] R13: 000077f001aea5a0 R14: 0000000000000001 R15: 000000000000004f
[1220118.232442] </TASK>
After panic occurs, httm process freezes (D/Z) and cannot be killed. It's not possible to cleanly reboot the system anymore.
I don't dare to conduct further tests with the production system. But as far as i remember the kernel panic only occured if the zfs snapshot was NOT mounted before the "restic backup" action on the Snapshot takes place. This is, if the automount was triggered by restic itself. Since restic is a go executable and go is a bit special on things like threads and concurrency as far as i know, maybe there is something like a deadlock? But i will stop fantasizing with my half-baked knowledge now ;-).
Managed to reliably reproduce this using httm on ubuntu 25.10 test system (zfs 2.3.4)
zfs setup:
pool testpool mounted on /testpool
dataset testpool/filesystem
Create snapshots on filesystem:
# cd /testpool/filesystem
# git clone https://github.com/openzfs/zfs.git
# cd zfs
# git tag | sort | tac | while read -r tag; do git checkout -f "${tag}"; zfs snapshot "testpool/filesystem@${tag}"; done
Start endless background unmounting of snapshots:
# (while true ; do sudo umount /testpool/filesystem/.zfs/snapshot/* >/dev/null 2>&1; done) &
Run httm and browse up/down for a bit:
# httm . --preview --browse
dmesg panic
[ 183.050896] VERIFY(avl_find(tree, new_node, &where) == NULL) failed
[ 183.050915] PANIC at avl.c:625:avl_add()
[ 183.050923] Showing stack for process 4530
[ 183.050926] CPU: 1 UID: 1000 PID: 4530 Comm: httm Tainted: P O 6.17.0-7-generic #7-Ubuntu PREEMPT(voluntary)
[ 183.050929] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[ 183.050930] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[ 183.050931] Call Trace:
[ 183.050933] <TASK>
[ 183.050935] show_stack+0x49/0x60
[ 183.050941] dump_stack_lvl+0x5f/0x90
[ 183.050948] dump_stack+0x10/0x18
[ 183.050951] spl_dumpstack+0x28/0x40 [spl]
[ 183.050960] spl_panic+0xf1/0x116 [spl]
[ 183.050969] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.050973] spl_assert.constprop.0+0x2a/0x40 [zfs]
[ 183.051083] avl_add+0x6f/0x80 [zfs]
[ 183.051167] zfsctl_snapshot_add+0x21/0x40 [zfs]
[ 183.051266] zfsctl_snapshot_mount+0x4bf/0x590 [zfs]
[ 183.051358] zpl_snapdir_automount+0x10/0x40 [zfs]
[ 183.051439] __traverse_mounts+0x8d/0x220
[ 183.051443] step_into+0x1c8/0x410
[ 183.051446] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051448] walk_component+0x4b/0x1a0
[ 183.051451] path_lookupat+0xa7/0x180
[ 183.051453] filename_lookup+0xf3/0x210
[ 183.051455] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051457] ? rseq_ip_fixup+0x90/0x1e0
[ 183.051464] vfs_statx+0x79/0x160
[ 183.051467] do_statx+0x67/0xc0
[ 183.051472] __x64_sys_statx+0x8e/0xf0
[ 183.051474] x64_sys_call+0x1f67/0x2330
[ 183.051476] do_syscall_64+0x81/0xc90
[ 183.051480] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051481] ? rseq_get_rseq_cs.isra.0+0x16/0x240
[ 183.051484] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051485] ? rseq_ip_fixup+0x90/0x1e0
[ 183.051488] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051489] ? restore_fpregs_from_fpstate+0x46/0xe0
[ 183.051493] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051494] ? switch_fpu_return+0x5c/0xf0
[ 183.051496] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051507] ? arch_exit_to_user_mode_prepare.isra.0+0xc2/0xe0
[ 183.051509] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051510] ? do_syscall_64+0xb9/0xc90
[ 183.051512] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051514] ? arch_exit_to_user_mode_prepare.isra.0+0xc2/0xe0
[ 183.051515] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051516] ? do_syscall_64+0xb9/0xc90
[ 183.051518] ? srso_alias_return_thunk+0x5/0xfbef5
[ 183.051519] ? sysvec_call_function_single+0x54/0xd0
[ 183.051522] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 183.051523] RIP: 0033:0x7cbca392841e
[ 183.051526] Code: 64 c7 00 16 00 00 00 e9 81 fd ff ff e8 2b cc 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 41 89 ca b8 4c 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 89 c1 85 c0 74 0f 48 8b 05 bd b9 10 00 64
[ 183.051527] RSP: 002b:00007cbca24f31c8 EFLAGS: 00000202 ORIG_RAX: 000000000000014c
[ 183.051529] RAX: ffffffffffffffda RBX: 00007cbca24f3480 RCX: 00007cbca392841e
[ 183.051530] RDX: 0000000000000100 RSI: 00007cbca24f3300 RDI: 00000000ffffff9c
[ 183.051531] RBP: 00007cbca24f32f0 R08: 00007cbca24f31d0 R09: 00007cbca24f3330
[ 183.051532] R10: 0000000000000fff R11: 0000000000000202 R12: 00007cbca24f36f0
[ 183.051533] R13: 00007cbca24f35a0 R14: 0000000000000001 R15: 0000000000000032
[ 183.051536] </TASK>
Nothing interesting in /proc/spl/kstat/zfs/dbgmsg other than a bunch of
zfs_ctldir.c:1264:zfsctl_snapshot_mount(): Unable to automount /testpool/filesystem/.zfs/snapshot/* error=8192
Some kind of race condition is happening indeed.
EDIT: Wasn't able to trigger the bug with Ubuntu 24.04 zfs-2.2.2-0ubuntu9.4 on 6.8.0-87-generic
I played around a little bit with a virtualbox machine and a testpool based on two files. Now i can reproduce this Kernel Panic by:
- create a zfs snapshot
- make sure it is not automounted
- make backup with restic using test/.zfs/snapshot/testshot
I must admit that i do not know much about all this stuff but i tried to start restic with strace. Doing so the Kernel Panic does NOT happen. I have repeated this several times since i have a virtualbox snapshot of the machine just before the kernel panic occurs. There is nothing written to /proc/spl/kstat/zfs/dbgmsg. Any ideas what else i could look at?
I'm very curious how to bring devs attention to make it fail gracefully, if proper solution complexity is unknown.
As of now, 2.3 is basically unusable since it hoses itself during basic snapshots operations.
I've continued doing some testing of my own and have 2 findings to share back:
https://github.com/openzfs/zfs/issues/17659#issuecomment-3609330938
- The second dataset, the one that reliably causes a panic fairly quickly, doesn't seem particularly problematic - 55 files and directories totaling 56MB, exclusively files and directories (no links, sockets, etc), just the basic configuration data for an application that runs in a Docker container. Just for good measure, I
zfs destroyed that dataset and recreated itzfs create ..., restoring the original data, and the panic happens still.
So as it turned out, I'd made a mistake in my script that caused it to call restic backup /$mountpoint/.zfs/snapshots/$snapname twice, and it was always the second restic backup... command that triggered the kernel panic. So maybe the underlying issue is triggered by the systemd automount being called twice for a given snapshot? Just spitballing.
In the meantime, I've rewritten my script to explicitly mount the snapshot (e.g. mount -t zfs $dataset@$snapname /backup) and a couple hundred runs later, no kernel panics. So the issue - in my limited testing/experience - seems to definitely be connected to systemd automounting the snapshot.
I believe the recently merged https://github.com/openzfs/zfs/pull/17943 should resolve the issue.
Tested master, and can confirm that issue is solved.