zfs
zfs copied to clipboard
abd_alloc_zero_scatter lockup when sg_alloc_table
System information
Type | Version/Name |
---|---|
Distribution Name | |
Distribution Version | |
Kernel Version | 4.4.94 |
Architecture | mips |
OpenZFS Version | 2.2.3 |
Describe the problem you're observing
kernel lockup due to sg_alloc_table Cause : nr_pages should not larger than SG_MAX_SINGLE_ALLOC Suggest to use page size that not larger than SG_MAX_SINGLE_ALLOC - int nr_pages = abd_chunkcnt_for_bytes(SPA_MAXBLOCKSIZE); + int nr_pages = MIN(abd_chunkcnt_for_bytes(SPA_MAXBLOCKSIZE),SG_MAX_SINGLE_ALLOC);
Describe how to reproduce the problem
- insert spl.ko
- insert zfs.ko
kernel log
[ 221.003720] WARNING: CPU: 1 PID: 1548 at lib/scatterlist.c:287 __sg_alloc_ta)
[ 221.012644] Modules linked in: zfs(O+) spl(O) fb(O) vdec(O) vo(O) ipu(O)
[ 221.019751] CPU: 1 PID: 1548 Comm: insmod Tainted: G O 4.4.94-21
[ 221.028107] Stack : 806f7a0a 0000004a 00000000 80700000 00000000 00000000 807
805eb6a0 00000001 0000060c 806f4708 80244b7c 8c377c58 00000100 8007458
00000100 80240540 00000001 00000000 805f079c 8c377b14 8c377af8 800a624
80244a48 800342ec 80244b7c 8c377c58 8c377b14 00000002 00000001 8c79a08
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000
...
[ 221.065702] Call Trace:
[ 221.068284] [<8001ea54>] show_stack+0x70/0x8c
[ 221.072886] [<80235c44>] dump_stack+0x94/0xd0
[ 221.077487] [<8003449c>] warn_slowpath_common+0xa0/0xd0
[ 221.083005] [<80034554>] warn_slowpath_null+0x18/0x24
[ 221.088336] [<80244a48>] __sg_alloc_table+0x78/0x1ac
[ 221.093575] [<80244ba8>] sg_alloc_table+0x2c/0x64
[ 221.099398] [<c0a2a920>] abd_init+0x348/0x4e8 [zfs]
[ 221.106248] [<c08f00b4>] dmu_init+0x18/0xac [zfs]
[ 221.112922] [<c099935c>] spa_init+0x19c/0x300 [zfs]
[ 221.119806] [<c0a01af0>] zfs_kmod_init+0x34/0x1008 [zfs]
[ 221.127098] [<c0bce364>] openzfs_init+0x70/0x164 [zfs]
[ 221.133357] [<8001067c>] do_one_initcall+0x1e8/0x1fc
[ 221.138605] [<800a6414>] do_init_module+0x74/0x1d0
[ 221.143671] [<800a0f64>] load_module+0x1adc/0x1d80
[ 221.148730] [<800a1314>] SyS_init_module+0x10c/0x164
[ 221.153975] [<80023bf8>] syscall_common+0x30/0x54
[ 221.158937]
[ 221.160567] ---[ end trace 08475ac9df2d95d3 ]---
The difference seems to be whether or not the architecture has support for "sg chaining". I think any fix is a little more complicated than that, but yeah, we could be a little smarter here for architectures that don't support it.
For mips
, I think sg chaining was supported from 5.x. If you have the option, a kernel update might get you going.
When changing the recordsize to 4MB
- Probe kernel module specific max zfs_recordsize
load zfs modules with zfs_max_recordsize=4194304
- Set recordsize
zfs set recordsize=4M pool;zfs set recordsize=4M pool/data
- Make changes in the "pool/data"
- Kernel panic due to sg_alloc_table dead loop in abd_alloc_chunks.
kernel log
[ 52.959935] ------------[ cut here ]------------
[ 52.964871] WARNING: CPU: 0 PID: 1082 at lib/scatterlist.c:287 __sg_alloc_ta)
[ 52.973841] Modules linked in: zfs(O) spl(O) fb(O) vdec(O) vo(O) ipu(O)
[ 52.980900] CPU: 0 PID: 1082 Comm: z_wr_iss Tainted: G O 4.4.941
[ 52.989440] Stack : 806f7a0a 0000004c 00000000 80700000 00000000 00000000 807
805eb6a0 00000000 0000043a 806f4708 000a0000 89df3ca8 00000100 8007458
00000100 80240540 00000001 00000000 805f079c 89df3b74 89df3b58 800a624
80244a48 800342ec 000a0000 89df3ca8 89df3b74 00000002 00200000 89e2408
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000
...
[ 53.027044] Call Trace:
[ 53.029630] [<8001ea54>] show_stack+0x70/0x8c
[ 53.034232] [<80235c44>] dump_stack+0x94/0xd0
[ 53.038834] [<8003449c>] warn_slowpath_common+0xa0/0xd0
[ 53.044350] [<80034554>] warn_slowpath_null+0x18/0x24
[ 53.049684] [<80244a48>] __sg_alloc_table+0x78/0x1ac
[ 53.054924] [<80244ba8>] sg_alloc_table+0x2c/0x64
[ 53.060553] [
Changes should apply to all functions below abd_alloc_chunks abd_alloc_chunks abd_alloc_zero_scatter