zfs icon indicating copy to clipboard operation
zfs copied to clipboard

abd_alloc_zero_scatter lockup when sg_alloc_table

Open AzAlam1 opened this issue 10 months ago • 2 comments

System information

Type Version/Name
Distribution Name
Distribution Version
Kernel Version 4.4.94
Architecture mips
OpenZFS Version 2.2.3

Describe the problem you're observing

kernel lockup due to sg_alloc_table Cause : nr_pages should not larger than SG_MAX_SINGLE_ALLOC Suggest to use page size that not larger than SG_MAX_SINGLE_ALLOC - int nr_pages = abd_chunkcnt_for_bytes(SPA_MAXBLOCKSIZE); + int nr_pages = MIN(abd_chunkcnt_for_bytes(SPA_MAXBLOCKSIZE),SG_MAX_SINGLE_ALLOC);

Describe how to reproduce the problem

  1. insert spl.ko
  2. insert zfs.ko

kernel log

[  221.003720] WARNING: CPU: 1 PID: 1548 at lib/scatterlist.c:287 __sg_alloc_ta)
[  221.012644] Modules linked in: zfs(O+) spl(O) fb(O) vdec(O) vo(O) ipu(O)
[  221.019751] CPU: 1 PID: 1548 Comm: insmod Tainted: G           O    4.4.94-21
[  221.028107] Stack : 806f7a0a 0000004a 00000000 80700000 00000000 00000000 807
          805eb6a0 00000001 0000060c 806f4708 80244b7c 8c377c58 00000100 8007458
          00000100 80240540 00000001 00000000 805f079c 8c377b14 8c377af8 800a624
          80244a48 800342ec 80244b7c 8c377c58 8c377b14 00000002 00000001 8c79a08
          00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000
          ...
[  221.065702] Call Trace:
[  221.068284] [<8001ea54>] show_stack+0x70/0x8c
[  221.072886] [<80235c44>] dump_stack+0x94/0xd0
[  221.077487] [<8003449c>] warn_slowpath_common+0xa0/0xd0
[  221.083005] [<80034554>] warn_slowpath_null+0x18/0x24
[  221.088336] [<80244a48>] __sg_alloc_table+0x78/0x1ac
[  221.093575] [<80244ba8>] sg_alloc_table+0x2c/0x64
[  221.099398] [<c0a2a920>] abd_init+0x348/0x4e8 [zfs]
[  221.106248] [<c08f00b4>] dmu_init+0x18/0xac [zfs]
[  221.112922] [<c099935c>] spa_init+0x19c/0x300 [zfs]
[  221.119806] [<c0a01af0>] zfs_kmod_init+0x34/0x1008 [zfs]
[  221.127098] [<c0bce364>] openzfs_init+0x70/0x164 [zfs]
[  221.133357] [<8001067c>] do_one_initcall+0x1e8/0x1fc
[  221.138605] [<800a6414>] do_init_module+0x74/0x1d0
[  221.143671] [<800a0f64>] load_module+0x1adc/0x1d80
[  221.148730] [<800a1314>] SyS_init_module+0x10c/0x164
[  221.153975] [<80023bf8>] syscall_common+0x30/0x54
[  221.158937]
[  221.160567] ---[ end trace 08475ac9df2d95d3 ]---

AzAlam1 avatar Apr 23 '24 06:04 AzAlam1

The difference seems to be whether or not the architecture has support for "sg chaining". I think any fix is a little more complicated than that, but yeah, we could be a little smarter here for architectures that don't support it.

For mips, I think sg chaining was supported from 5.x. If you have the option, a kernel update might get you going.

robn avatar Apr 23 '24 11:04 robn

When changing the recordsize to 4MB

  1. Probe kernel module specific max zfs_recordsize load zfs modules with zfs_max_recordsize=4194304
  2. Set recordsize zfs set recordsize=4M pool;zfs set recordsize=4M pool/data
  3. Make changes in the "pool/data"
  4. Kernel panic due to sg_alloc_table dead loop in abd_alloc_chunks.

kernel log [ 52.959935] ------------[ cut here ]------------ [ 52.964871] WARNING: CPU: 0 PID: 1082 at lib/scatterlist.c:287 __sg_alloc_ta) [ 52.973841] Modules linked in: zfs(O) spl(O) fb(O) vdec(O) vo(O) ipu(O) [ 52.980900] CPU: 0 PID: 1082 Comm: z_wr_iss Tainted: G O 4.4.941 [ 52.989440] Stack : 806f7a0a 0000004c 00000000 80700000 00000000 00000000 807 805eb6a0 00000000 0000043a 806f4708 000a0000 89df3ca8 00000100 8007458 00000100 80240540 00000001 00000000 805f079c 89df3b74 89df3b58 800a624 80244a48 800342ec 000a0000 89df3ca8 89df3b74 00000002 00200000 89e2408 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000 ... [ 53.027044] Call Trace: [ 53.029630] [<8001ea54>] show_stack+0x70/0x8c [ 53.034232] [<80235c44>] dump_stack+0x94/0xd0 [ 53.038834] [<8003449c>] warn_slowpath_common+0xa0/0xd0 [ 53.044350] [<80034554>] warn_slowpath_null+0x18/0x24 [ 53.049684] [<80244a48>] __sg_alloc_table+0x78/0x1ac [ 53.054924] [<80244ba8>] sg_alloc_table+0x2c/0x64 [ 53.060553] [] abd_alloc_chunks+0x6c/0x1dc [zfs] [ 53.067655] [] abd_alloc+0xe0/0x128 [zfs] [ 53.074083] [] arc_hdr_alloc_abd+0xfc/0x104 [zfs] [ 53.081253] [] arc_write_ready+0x558/0x58c [zfs] [ 53.088353] [] zio_ready+0x68/0x35c [zfs] [ 53.094858] [] zio_execute+0x1bc/0x1cc [zfs] [ 53.100994] [] taskq_thread+0x3e0/0x4e0 [spl] [ 53.106556] [<8004f814>] kthread+0xe4/0xec [ 53.110887] [<800193ec>] ret_from_kernel_thread+0x14/0x1c [ 53.116583] [ 53.118195] ---[ end trace 1430cc01cdbd5ab2 ]---

Changes should apply to all functions below abd_alloc_chunks abd_alloc_chunks abd_alloc_zero_scatter

AzAlam1 avatar May 02 '24 03:05 AzAlam1