zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Panic within write syscall

Open IsaacVaughn opened this issue 2 years ago • 6 comments

System information

Type Version/Name
Distribution Name manjaro
Distribution Version 22.0.0
Kernel Version 5.15.74
Architecture x86_64
OpenZFS Version 2.1.6

Describe the problem you're observing

ZFS panics during writes sometimes. It may be related to some system wide lockups I was experiencing, or it may just be that I noticed the panic because I was investigating the lockups. I will update this issue if I find more info.

Describe how to reproduce the problem

Unsure. It started happening seemingly out of nowhere.

Include any warning/errors/backtraces from the system logs

[  166.730890] PANIC: zfs: accessing past end of object 607/6970f (size=337408 access=337044+4096)
[  166.730894] Showing stack for process 15458
[  166.730895] CPU: 3 PID: 15458 Comm: ThreadPoolForeg Tainted: P           OE     5.15.74-3-MANJARO #1 6719dea2e0f43080bcbb84589c5fe78ca0af8a1c
[  166.730898] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P2.80 01/03/2020
[  166.730899] Call Trace:
[  166.730900]  <TASK>
[  166.730902]  dump_stack_lvl+0x45/0x63
[  166.730907]  vcmn_err.cold+0x50/0x68 [spl bc6f9d2ccb25306799e7910bfc83907a0873e1f7]
[  166.730915]  zfs_panic_recover+0x75/0xa0 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.730983]  dmu_buf_hold_array_by_dnode+0x221/0x6a0 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.731034]  dmu_write_uio_dnode+0x5e/0x1a0 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.731084]  ? dnode_set_blksz+0x141/0x310 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.731136]  dmu_write_uio_dbuf+0x4a/0x70 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.731185]  zfs_write+0x4d7/0xc50 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.731249]  zpl_iter_write+0x10d/0x190 [zfs dad9502cfc0440f4d15c76a4c3daa924b108e590]
[  166.731308]  new_sync_write+0x14d/0x1f0
[  166.731312]  vfs_write+0x203/0x290
[  166.731313]  __x64_sys_pwrite64+0xa0/0xd0
[  166.731315]  do_syscall_64+0x58/0x90
[  166.731317]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  166.731320] RIP: 0033:0x7f31d96300f7
[  166.731322] Code: 08 89 3c 24 48 89 4c 24 18 e8 85 dc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 d5 dc f8 ff 48 8b
[  166.731323] RSP: 002b:00007f31c4aa94c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[  166.731325] RAX: ffffffffffffffda RBX: 00003b700f80fc00 RCX: 00007f31d96300f7
[  166.731326] RDX: 0000000000001000 RSI: 00003b700f80fc00 RDI: 00000000000000f1
[  166.731327] RBP: 00007f31c4aa9520 R08: 0000000000000000 R09: 0000561cec223950
[  166.731327] R10: 0000000000052494 R11: 0000000000000293 R12: 00003b700f80fc00
[  166.731328] R13: 0000000000001000 R14: 00000000000000f1 R15: 0000000000052494
[  166.731330]  </TASK>

IsaacVaughn avatar Oct 31 '22 15:10 IsaacVaughn

This is probably not a bug in ZFS, it turns out my PSU was failing.

IsaacVaughn avatar Nov 05 '22 13:11 IsaacVaughn

And of the course the moment I say that, it happens again on a different PSU and MB. [ 136.056597] Call Trace: [ 136.056599] <TASK> [ 136.056601] dump_stack_lvl+0x48/0x60 [ 136.056605] vcmn_err.cold+0x50/0x68 [spl a9c0243da0ec826d663df1ab73210d88477c5646] [ 136.056612] zfs_panic_recover+0x79/0xa0 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056647] dmu_buf_hold_array_by_dnode+0x217/0x690 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056668] dmu_write_uio_dnode+0x62/0x190 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056687] ? dnode_set_blksz+0x145/0x310 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056709] dmu_write_uio_dbuf+0x4e/0x70 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056728] zfs_write+0x4bb/0xc40 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056759] zpl_iter_write+0x11f/0x1a0 [zfs e7822a951d039e3c29b527b8646eeb4fa7057713] [ 136.056787] vfs_write+0x236/0x3f0 [ 136.056789] __x64_sys_pwrite64+0x98/0xd0 [ 136.056790] do_syscall_64+0x5c/0x90 [ 136.056793] ? __x64_sys_pwrite64+0xa8/0xd0 [ 136.056793] ? syscall_exit_to_user_mode+0x1b/0x40 [ 136.056794] ? do_syscall_64+0x6b/0x90 [ 136.056795] ? exc_page_fault+0x74/0x170 [ 136.056795] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 136.056798] RIP: 0033:0x7fc8ef3880f7 [ 136.056799] Code: 08 89 3c 24 48 89 4c 24 18 e8 85 dc f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 d5 dc f8 ff 48 8b [ 136.056799] RSP: 002b:00007fc8d8f1d4c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012 [ 136.056801] RAX: ffffffffffffffda RBX: 00003b5409d82400 RCX: 00007fc8ef3880f7 [ 136.056802] RDX: 0000000000001000 RSI: 00003b5409d82400 RDI: 00000000000000de [ 136.056802] RBP: 00007fc8d8f1d520 R08: 0000000000000000 R09: 00005560a3fecdc0 [ 136.056802] R10: 0000000000052494 R11: 0000000000000293 R12: 00003b5409d82400 [ 136.056803] R13: 0000000000001000 R14: 00000000000000de R15: 0000000000052494

IsaacVaughn avatar Nov 05 '22 13:11 IsaacVaughn

I am continuing to get "accessing past end of object" errors.

PANIC: zfs: accessing past end of object 607/6970f (size=337408 access=337044+4096)

It is always the same object name, same size, and same access.

IsaacVaughn avatar Nov 05 '22 17:11 IsaacVaughn

I have confirmed this doesn't happen when loading my pool off a rescue USB with ZFS 0.8.x

I will look into getting 2.0.x installed to see if the issue occurs there as well.

IsaacVaughn avatar Nov 05 '22 18:11 IsaacVaughn

I was able to work around the issue by wiping out my browser's entries in ~/.config, apparently something had gotten corrupted and was causing the problem. However, I still think this is a major issue. A misbehaving non-privileged program should not be able to lock up the entire kernel via zfs. (The initial panic caused a number of internal zfs tasks to become hung, thus deadlocking the entire kernel.)

IsaacVaughn avatar Nov 05 '22 19:11 IsaacVaughn

Can confirm the issue using recent zfs git

$ zpool version
zfs-2.1.99-1842_gd520f64342
zfs-kmod-2.1.99-1

this started to happen after i changed the recordsize and compression for an encrypted dataset on a nvme-pool with ashift=13 basically zfs set recordsize=16k pool/dataset und zfs set compression=zstd-fast1 pool/dataset broke it - now the system repeatly hangs.

Apr 02 20:34:39 kleinerhellraiser kernel: PANIC: zfs: accessing past end of object 29a/33e6a2 (size=9216 access=8724+4096)
Apr 02 20:34:39 kleinerhellraiser kernel: Showing stack for process 120428
Apr 02 20:34:39 kleinerhellraiser kernel: CPU: 7 PID: 120428 Comm: ThreadPoolForeg Tainted: P     U     OE      6.1.22-1.1-lts #1 86d10782fa66a45142e5186ce7e7168913d888e8
Apr 02 20:34:39 kleinerhellraiser kernel: Hardware name: Dell Inc. Latitude E5470/06DNG5, BIOS 1.34.3 11/20/2022
Apr 02 20:34:39 kleinerhellraiser kernel: Call Trace:
Apr 02 20:34:39 kleinerhellraiser kernel:  <TASK>
Apr 02 20:34:39 kleinerhellraiser kernel:  dump_stack_lvl+0x48/0x60
Apr 02 20:34:39 kleinerhellraiser kernel:  vcmn_err.cold+0x68/0x80 [spl fa21e1dc02e38701aa0cb7c68efc8221fb7f2f5b]
Apr 02 20:34:39 kleinerhellraiser kernel:  zfs_panic_recover+0x79/0xa0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  dmu_buf_hold_array_by_dnode+0x20f/0x680 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  ? dbuf_rele_and_unlock+0xfa/0x4f0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  dmu_write_uio_dnode+0x62/0x190 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  ? dnode_set_blksz+0x145/0x310 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  dmu_write_uio_dbuf+0x4e/0x70 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  zfs_write+0x4bc/0xcb0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  zpl_iter_write+0x125/0x1b0 [zfs c86844b97862a7c8a627e0f54094a5392b10cf48]
Apr 02 20:34:39 kleinerhellraiser kernel:  vfs_write+0x236/0x3f0
Apr 02 20:34:39 kleinerhellraiser kernel:  __x64_sys_pwrite64+0x98/0xd0
Apr 02 20:34:39 kleinerhellraiser kernel:  do_syscall_64+0x5c/0x90
Apr 02 20:34:39 kleinerhellraiser kernel:  ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Apr 02 20:34:39 kleinerhellraiser kernel:  ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel:  ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Apr 02 20:34:39 kleinerhellraiser kernel:  ? do_syscall_64+0x6b/0x90
Apr 02 20:34:39 kleinerhellraiser kernel:  ? exc_page_fault+0x74/0x170
Apr 02 20:34:39 kleinerhellraiser kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Apr 02 20:34:39 kleinerhellraiser kernel: RIP: 0033:0x7fc874769737
Apr 02 20:34:39 kleinerhellraiser kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 65 01 f8 ff 41 89 c0 4c 8b 54 24 18 48 8b 54 24 10 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 b5 01 f8 ff 48 8b
Apr 02 20:34:39 kleinerhellraiser kernel: RSP: 002b:00007fc85fe9d5f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 02 20:34:39 kleinerhellraiser kernel: RAX: ffffffffffffffda RBX: 0000305c06478000 RCX: 00007fc874769737
Apr 02 20:34:39 kleinerhellraiser kernel: RDX: 0000000000001000 RSI: 0000305c06478000 RDI: 000000000000011f
Apr 02 20:34:39 kleinerhellraiser kernel: RBP: 00007fc85fe9d660 R08: 0000000000000000 R09: 000055f5a5344b98
Apr 02 20:34:39 kleinerhellraiser kernel: R10: 0000000000002214 R11: 0000000000000293 R12: 0000000000001000
Apr 02 20:34:39 kleinerhellraiser kernel: R13: 0000305c06478000 R14: 000000000000011f R15: 0000000000002214
Apr 02 20:34:39 kleinerhellraiser kernel:  </TASK>

mtippmann avatar Apr 02 '23 18:04 mtippmann