zfs
zfs copied to clipboard
Panic within write syscall
System information
Type | Version/Name |
---|---|
Distribution Name | manjaro |
Distribution Version | 22.0.0 |
Kernel Version | 5.15.74 |
Architecture | x86_64 |
OpenZFS Version | 2.1.6 |
Describe the problem you're observing
ZFS panics during writes sometimes. It may be related to some system wide lockups I was experiencing, or it may just be that I noticed the panic because I was investigating the lockups. I will update this issue if I find more info.
Describe how to reproduce the problem
Unsure. It started happening seemingly out of nowhere.
Include any warning/errors/backtraces from the system logs
[ 166.730890] PANIC: zfs: accessing past end of object 607/6970f (size=337408 access=337044+4096)
[ 166.730894] Showing stack for process 15458
[ 166.730895] CPU: 3 PID: 15458 Comm: ThreadPoolForeg Tainted: P OE 5.15.74-3-MANJARO #1 6719dea2e0f43080bcbb84589c5fe78ca0af8a1c
[ 166.730898] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P2.80 01/03/2020
[ 166.730899] Call Trace:
[ 166.730900] <TASK>
[ 166.730902] dump_stack_lvl+0x45/0x63
[ 166.730907] vcmn_err.cold+0x50/0x68 [spl bc6f9d2ccb25306799e7910bfc83907a0873e1f7]
[ 166.730915] zfs_panic_recover+0x75/0xa0 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.730983] dmu_buf_hold_array_by_dnode+0x221/0x6a0 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.731034] dmu_write_uio_dnode+0x5e/0x1a0 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.731084] ? dnode_set_blksz+0x141/0x310 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.731136] dmu_write_uio_dbuf+0x4a/0x70 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.731185] zfs_write+0x4d7/0xc50 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.731249] zpl_iter_write+0x10d/0x190 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[ 166.731308] new_sync_write+0x14d/0x1f0
[ 166.731312] vfs_write+0x203/0x290
[ 166.731313] __x64_sys_pwrite64+0xa0/0xd0
[ 166.731315] do_syscall_64+0x58/0x90
[ 166.731317] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 166.731320] RIP: 0033:0x7f31d96300f7
[ 166.731322] Code: 08 89 3c 24 48 89 4c 24 18 e8 85 dc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 d5 dc f8 ff 48 8b
[ 166.731323] RSP: 002b:00007f31c4aa94c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[ 166.731325] RAX: ffffffffffffffda RBX: 00003b700f80fc00 RCX: 00007f31d96300f7
[ 166.731326] RDX: 0000000000001000 RSI: 00003b700f80fc00 RDI: 00000000000000f1
[ 166.731327] RBP: 00007f31c4aa9520 R08: 0000000000000000 R09: 0000561cec223950
[ 166.731327] R10: 0000000000052494 R11: 0000000000000293 R12: 00003b700f80fc00
[ 166.731328] R13: 0000000000001000 R14: 00000000000000f1 R15: 0000000000052494
[ 166.731330] </TASK>
This is probably not a bug in ZFS, it turns out my PSU was failing.
And of the course the moment I say that, it happens again on a different PSU and MB. [ 136.056597] Call Trace: [ 136.056599] <TASK> [ 136.056601] dump_stack_lvl+0x48/0x60 [ 136.056605] vcmn_err.cold+0x50/0x68 [spl a9c0243da0ec826d663df1ab73210d88477c5646] [ 136.056612] zfs_panic_recover+0x79/0xa0 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056647] dmu_buf_hold_array_by_dnode+0x217/0x690 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056668] dmu_write_uio_dnode+0x62/0x190 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056687] ? dnode_set_blksz+0x145/0x310 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056709] dmu_write_uio_dbuf+0x4e/0x70 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056728] zfs_write+0x4bb/0xc40 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056759] zpl_iter_write+0x11f/0x1a0 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056787] vfs_write+0x236/0x3f0 [ 136.056789] __x64_sys_pwrite64+0x98/0xd0 [ 136.056790] do_syscall_64+0x5c/0x90 [ 136.056793] ? __x64_sys_pwrite64+0xa8/0xd0 [ 136.056793] ? syscall_exit_to_user_mode+0x1b/0x40 [ 136.056794] ? do_syscall_64+0x6b/0x90 [ 136.056795] ? exc_page_fault+0x74/0x170 [ 136.056795] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 136.056798] RIP: 0033:0x7fc8ef3880f7 [ 136.056799] Code: 08 89 3c 24 48 89 4c 24 18 e8 85 dc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 d5 dc f8 ff 48 8b [ 136.056799] RSP: 002b:00007fc8d8f1d4c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012 [ 136.056801] RAX: ffffffffffffffda RBX: 00003b5409d82400 RCX: 00007fc8ef3880f7 [ 136.056802] RDX: 0000000000001000 RSI: 00003b5409d82400 RDI: 00000000000000de [ 136.056802] RBP: 00007fc8d8f1d520 R08: 0000000000000000 R09: 00005560a3fecdc0 [ 136.056802] R10: 0000000000052494 R11: 0000000000000293 R12: 00003b5409d82400 [ 136.056803] R13: 0000000000001000 R14: 00000000000000de R15: 0000000000052494
I am continuing to get "accessing past end of object" errors.
PANIC: zfs: accessing past end of object 607/6970f (size=337408 access=337044+4096)
It is always the same object name, same size, and same access.
I have confirmed this doesn't happen when loading my pool off a rescue USB with ZFS 0.8.x
I will look into getting 2.0.x installed to see if the issue occurs there as well.
I was able to work around the issue by wiping out my browser's entries in ~/.config, apparently something had gotten corrupted and was causing the problem. However, I still think this is a major issue. A misbehaving non-privileged program should not be able to lock up the entire kernel via zfs. (The initial panic caused a number of internal zfs tasks to become hung, thus deadlocking the entire kernel.)
Can confirm the issue using recent zfs git
$ zpool version
zfs-2.1.99-1842_gd520f64342
zfs-kmod-2.1.99-1
this started to happen after i changed the recordsize and compression for an encrypted dataset on a nvme-pool with ashift=13
basically zfs set recordsize=16k pool/dataset
und zfs set compression=zstd-fast1 pool/dataset
broke it - now the system repeatly hangs.
Apr 02 20:34:39 kleinerhellraiser kernel: PANIC: zfs: accessing past end of object 29a/33e6a2 (size=9216 access=8724+4096)
Apr 02 20:34:39 kleinerhellraiser kernel: Showing stack for process 120428
Apr 02 20:34:39 kleinerhellraiser kernel: CPU: 7 PID: 120428 Comm: ThreadPoolForeg Tainted: P U OE 6.1.22-1.1-lts #1 86d10782fa66a45142e5186ce7e7168913d888e8
Apr 02 20:34:39 kleinerhellraiser kernel: Hardware name: Dell Inc. Latitude E5470/06DNG5, BIOS 1.34.3 11/20/2022
Apr 02 20:34:39 kleinerhellraiser kernel: Call Trace:
Apr 02 20:34:39 kleinerhellraiser kernel: <TASK>
Apr 02 20:34:39 kleinerhellraiser kernel: dump_stack_lvl+0x48/0x60
Apr 02 20:34:39 kleinerhellraiser kernel: vcmn_err.cold+0x68/0x80 [spl fa21e1dc02e38701aa0cb7c68efc8221fb7f2f5b]
Apr 02 20:34:39 kleinerhellraiser kernel: zfs_panic_recover+0x79/0xa0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: dmu_buf_hold_array_by_dnode+0x20f/0x680 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: ? dbuf_rele_and_unlock+0xfa/0x4f0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: dmu_write_uio_dnode+0x62/0x190 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: ? dnode_set_blksz+0x145/0x310 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: dmu_write_uio_dbuf+0x4e/0x70 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: zfs_write+0x4bc/0xcb0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: zpl_iter_write+0x125/0x1b0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel: vfs_write+0x236/0x3f0
Apr 02 20:34:39 kleinerhellraiser kernel: __x64_sys_pwrite64+0x98/0xd0
Apr 02 20:34:39 kleinerhellraiser kernel: do_syscall_64+0x5c/0x90
Apr 02 20:34:39 kleinerhellraiser kernel: ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Apr 02 20:34:39 kleinerhellraiser kernel: ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel: ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Apr 02 20:34:39 kleinerhellraiser kernel: ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel: ? exc_page_fault+0x74/0x170
Apr 02 20:34:39 kleinerhellraiser kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Apr 02 20:34:39 kleinerhellraiser kernel: RIP: 0033:0x7fc874769737
Apr 02 20:34:39 kleinerhellraiser kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 65 01 f8 ff 41 89 c0 4c 8b 54 24 18 48 8b 54 24 10 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 b5 01 f8 ff 48 8b
Apr 02 20:34:39 kleinerhellraiser kernel: RSP: 002b:00007fc85fe9d5f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 02 20:34:39 kleinerhellraiser kernel: RAX: ffffffffffffffda RBX: 0000305c06478000 RCX: 00007fc874769737
Apr 02 20:34:39 kleinerhellraiser kernel: RDX: 0000000000001000 RSI: 0000305c06478000 RDI: 000000000000011f
Apr 02 20:34:39 kleinerhellraiser kernel: RBP: 00007fc85fe9d660 R08: 0000000000000000 R09: 000055f5a5344b98
Apr 02 20:34:39 kleinerhellraiser kernel: R10: 0000000000002214 R11: 0000000000000293 R12: 0000000000001000
Apr 02 20:34:39 kleinerhellraiser kernel: R13: 0000305c06478000 R14: 000000000000011f R15: 0000000000002214
Apr 02 20:34:39 kleinerhellraiser kernel: </TASK>