zfs
zfs copied to clipboard
ZFS 0.8: kworker sucking up 100% CPU, various messages in logs mentioning ZFS, system eventually hangs.
System information
Type | Version/Name |
---|---|
Distribution Name | Ubuntu Server |
Distribution Version | 19.10 Eoan Ermine |
Linux Kernel | 5.3.0-13-generic #14-Ubuntu SMP Tue Sep 24 02:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
Architecture | amd64 |
ZFS Version | 0.8.1-1ubuntu11 |
Describe the problem you're observing
After some hours of system uptime, a kworker
process will suck up 100% of a core. The system load averages will climb constantly. This kworker process appears to be ZFS related (as of this writing, the current CPU hog is named kworker/u33:3+flush-zfs-19
).
This system was originally on Ubuntu Server 18.04 with ZFS 0.8.1 that I had compiled some time before. It was at several months of uptime with no issues. Out of nowhere, this issue came up.
Things that have been tried:
- Rebooting several times.
- Upgrading to the Ubuntu HWE kernel
- Updating to the development release of Ubuntu 19.10 (which comes with ZFS 0.8) to rule out problems with the version I compiled.
I've tried everything I can think of, and since various ZFS-related terms come up in the kworker name and messages printed to dmesg, I am left to assume it is a problem in ZFS.
Describe how to reproduce the problem
Unknown. The system this is occurring on has been stable with minimum in the way of changes for about a year. The system is a host to various LXC containers. The root filesystem is ZFS, on a single 500GB Crucial SSD. All data for the LXC containers is stored on an encrypted storage pool. Output of zpool status:
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:02:31 with 0 errors on Mon Oct 7 22:35:32 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
sda1 ONLINE 0 0 0
errors: No known data errors
pool: storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:55:30 with 0 errors on Mon Oct 7 23:35:51 2019
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000cca24300d8f4 ONLINE 0 0 0
scsi-35000cca243014464 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000cca243015924 ONLINE 0 0 0
scsi-35000cca24400a9a4 ONLINE 0 0 0
errors: No known data errors
Include any warning/errors/backtraces from the system logs
Not sure what would be helpful for debugging, but here's some stuff that was printed to dmesg:
[47488.869573] INFO: task pip:19670 blocked for more than 120 seconds.
[47488.869627] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47488.869670] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47488.869718] pip D 0 19670 19606 0x00000120
[47488.869722] Call Trace:
[47488.869732] __schedule+0x2b9/0x6c0
[47488.869736] ? bit_wait_io+0x50/0x50
[47488.869738] schedule+0x42/0xb0
[47488.869741] bit_wait+0x11/0x50
[47488.869743] __wait_on_bit+0x33/0xa0
[47488.869749] __inode_wait_for_writeback+0xae/0xe0
[47488.869753] ? var_wake_function+0x30/0x30
[47488.869756] inode_wait_for_writeback+0x27/0x40
[47488.869759] evict+0xbd/0x1b0
[47488.869761] iput+0x141/0x1f0
[47488.869764] do_unlinkat+0x1c5/0x2d0
[47488.869767] __x64_sys_unlinkat+0x38/0x60
[47488.869770] do_syscall_64+0x5a/0x130
[47488.869774] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47488.869777] RIP: 0033:0x7f7a4ac39d77
[47488.869784] Code: Bad RIP value.
[47488.869786] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47488.869788] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47488.869790] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47488.869791] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47488.869792] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47488.869793] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47488.869797] INFO: task sync:20009 blocked for more than 120 seconds.
[47488.869840] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47488.869883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47488.869933] sync D 0 20009 20008 0x00004120
[47488.869936] Call Trace:
[47488.869940] __schedule+0x2b9/0x6c0
[47488.869943] schedule+0x42/0xb0
[47488.869945] wb_wait_for_completion+0x64/0x90
[47488.869948] ? wait_woken+0x80/0x80
[47488.869950] sync_inodes_sb+0xd7/0x290
[47488.869954] sync_inodes_one_sb+0x15/0x20
[47488.869957] iterate_supers+0xa3/0x110
[47488.869959] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47488.869962] ksys_sync+0x42/0xb0
[47488.869964] __ia32_sys_sync+0xe/0x20
[47488.869967] do_syscall_64+0x5a/0x130
[47488.869970] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47488.869972] RIP: 0033:0x7f56f6bd7287
[47488.869976] Code: Bad RIP value.
[47488.869977] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47488.869979] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47488.869981] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47488.869982] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47488.869984] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47488.869985] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47609.702068] INFO: task pip:19670 blocked for more than 241 seconds.
[47609.702114] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47609.702150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.702191] pip D 0 19670 19606 0x00000120
[47609.702194] Call Trace:
[47609.702203] __schedule+0x2b9/0x6c0
[47609.702206] ? bit_wait_io+0x50/0x50
[47609.702209] schedule+0x42/0xb0
[47609.702210] bit_wait+0x11/0x50
[47609.702212] __wait_on_bit+0x33/0xa0
[47609.702218] __inode_wait_for_writeback+0xae/0xe0
[47609.702221] ? var_wake_function+0x30/0x30
[47609.702223] inode_wait_for_writeback+0x27/0x40
[47609.702226] evict+0xbd/0x1b0
[47609.702227] iput+0x141/0x1f0
[47609.702230] do_unlinkat+0x1c5/0x2d0
[47609.702232] __x64_sys_unlinkat+0x38/0x60
[47609.702235] do_syscall_64+0x5a/0x130
[47609.702238] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47609.702240] RIP: 0033:0x7f7a4ac39d77
[47609.702247] Code: Bad RIP value.
[47609.702248] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47609.702251] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47609.702252] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47609.702253] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47609.702254] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47609.702255] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47609.702258] INFO: task sync:20009 blocked for more than 241 seconds.
[47609.702293] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47609.702328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.702368] sync D 0 20009 20008 0x00004120
[47609.702370] Call Trace:
[47609.702373] __schedule+0x2b9/0x6c0
[47609.702376] schedule+0x42/0xb0
[47609.702378] wb_wait_for_completion+0x64/0x90
[47609.702379] ? wait_woken+0x80/0x80
[47609.702381] sync_inodes_sb+0xd7/0x290
[47609.702384] sync_inodes_one_sb+0x15/0x20
[47609.702386] iterate_supers+0xa3/0x110
[47609.702388] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47609.702391] ksys_sync+0x42/0xb0
[47609.702393] __ia32_sys_sync+0xe/0x20
[47609.702395] do_syscall_64+0x5a/0x130
[47609.702398] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47609.702399] RIP: 0033:0x7f56f6bd7287
[47609.702402] Code: Bad RIP value.
[47609.702403] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47609.702405] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47609.702406] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47609.702407] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47609.702408] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47609.702409] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47730.534430] INFO: task journal-offline:32058 blocked for more than 120 seconds.
[47730.534481] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47730.534516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.534558] journal-offline D 0 32058 27128 0x00000120
[47730.534561] Call Trace:
[47730.534570] __schedule+0x2b9/0x6c0
[47730.534573] schedule+0x42/0xb0
[47730.534575] schedule_preempt_disabled+0xe/0x10
[47730.534578] __mutex_lock.isra.0+0x182/0x4f0
[47730.534581] __mutex_lock_slowpath+0x13/0x20
[47730.534583] mutex_lock+0x2e/0x40
[47730.534698] zil_commit_impl+0x148/0xd60 [zfs]
[47730.534766] ? zpl_write_common_iovec+0x100/0x100 [zfs]
[47730.534769] ? _cond_resched+0x19/0x30
[47730.534771] ? mutex_lock+0x13/0x40
[47730.534840] zil_commit+0x40/0x60 [zfs]
[47730.534908] zpl_writepages+0xec/0x180 [zfs]
[47730.534912] do_writepages+0x43/0xd0
[47730.534916] __filemap_fdatawrite_range+0xc4/0x100
[47730.534919] filemap_write_and_wait_range+0x36/0x90
[47730.534987] zpl_fsync+0x3d/0xa0 [zfs]
[47730.534991] vfs_fsync_range+0x49/0x80
[47730.534993] ? __fget_light+0x57/0x70
[47730.534995] do_fsync+0x3d/0x70
[47730.534997] __x64_sys_fsync+0x14/0x20
[47730.535001] do_syscall_64+0x5a/0x130
[47730.535003] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535006] RIP: 0033:0x7f872e0af237
[47730.535012] Code: Bad RIP value.
[47730.535014] RSP: 002b:00007f8729766cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[47730.535016] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f872e0af237
[47730.535017] RDX: 0000000000000000 RSI: 00007f872dcb762c RDI: 0000000000000014
[47730.535018] RBP: 00007f872dcb8da0 R08: 00007f8729767700 R09: 00007f8729767700
[47730.535019] R10: 000000000000062c R11: 0000000000000293 R12: 0000000000000002
[47730.535020] R13: 0000000000000000 R14: 00005647819af200 R15: 00007ffce0398f10
[47730.535120] INFO: task pip:19670 blocked for more than 362 seconds.
[47730.535155] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47730.535190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.535230] pip D 0 19670 19606 0x00000120
[47730.535232] Call Trace:
[47730.535235] __schedule+0x2b9/0x6c0
[47730.535238] ? bit_wait_io+0x50/0x50
[47730.535240] schedule+0x42/0xb0
[47730.535242] bit_wait+0x11/0x50
[47730.535243] __wait_on_bit+0x33/0xa0
[47730.535246] __inode_wait_for_writeback+0xae/0xe0
[47730.535249] ? var_wake_function+0x30/0x30
[47730.535252] inode_wait_for_writeback+0x27/0x40
[47730.535254] evict+0xbd/0x1b0
[47730.535256] iput+0x141/0x1f0
[47730.535258] do_unlinkat+0x1c5/0x2d0
[47730.535260] __x64_sys_unlinkat+0x38/0x60
[47730.535262] do_syscall_64+0x5a/0x130
[47730.535265] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535266] RIP: 0033:0x7f7a4ac39d77
[47730.535270] Code: Bad RIP value.
[47730.535271] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47730.535273] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47730.535274] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47730.535275] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47730.535276] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47730.535277] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47730.535280] INFO: task sync:20009 blocked for more than 362 seconds.
[47730.535314] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47730.535349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.535389] sync D 0 20009 20008 0x00004120
[47730.535391] Call Trace:
[47730.535394] __schedule+0x2b9/0x6c0
[47730.535396] schedule+0x42/0xb0
[47730.535398] wb_wait_for_completion+0x64/0x90
[47730.535400] ? wait_woken+0x80/0x80
[47730.535402] sync_inodes_sb+0xd7/0x290
[47730.535405] sync_inodes_one_sb+0x15/0x20
[47730.535407] iterate_supers+0xa3/0x110
[47730.535409] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47730.535411] ksys_sync+0x42/0xb0
[47730.535413] __ia32_sys_sync+0xe/0x20
[47730.535416] do_syscall_64+0x5a/0x130
[47730.535418] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535420] RIP: 0033:0x7f56f6bd7287
[47730.535423] Code: Bad RIP value.
[47730.535424] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47730.535425] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47730.535427] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47730.535428] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47730.535429] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47730.535429] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47851.366840] INFO: task journal-offline:32058 blocked for more than 241 seconds.
[47851.366900] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47851.366944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.366994] journal-offline D 0 32058 27128 0x00000124
[47851.366998] Call Trace:
[47851.367007] __schedule+0x2b9/0x6c0
[47851.367011] schedule+0x42/0xb0
[47851.367014] schedule_preempt_disabled+0xe/0x10
[47851.367017] __mutex_lock.isra.0+0x182/0x4f0
[47851.367021] __mutex_lock_slowpath+0x13/0x20
[47851.367023] mutex_lock+0x2e/0x40
[47851.367145] zil_commit_impl+0x148/0xd60 [zfs]
[47851.367224] ? zpl_write_common_iovec+0x100/0x100 [zfs]
[47851.367227] ? _cond_resched+0x19/0x30
[47851.367230] ? mutex_lock+0x13/0x40
[47851.367309] zil_commit+0x40/0x60 [zfs]
[47851.367387] zpl_writepages+0xec/0x180 [zfs]
[47851.367392] do_writepages+0x43/0xd0
[47851.367396] __filemap_fdatawrite_range+0xc4/0x100
[47851.367400] filemap_write_and_wait_range+0x36/0x90
[47851.367478] zpl_fsync+0x3d/0xa0 [zfs]
[47851.367483] vfs_fsync_range+0x49/0x80
[47851.367486] ? __fget_light+0x57/0x70
[47851.367489] do_fsync+0x3d/0x70
[47851.367491] __x64_sys_fsync+0x14/0x20
[47851.367495] do_syscall_64+0x5a/0x130
[47851.367498] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367501] RIP: 0033:0x7f872e0af237
[47851.367508] Code: Bad RIP value.
[47851.367510] RSP: 002b:00007f8729766cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[47851.367512] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f872e0af237
[47851.367514] RDX: 0000000000000000 RSI: 00007f872dcb762c RDI: 0000000000000014
[47851.367515] RBP: 00007f872dcb8da0 R08: 00007f8729767700 R09: 00007f8729767700
[47851.367516] R10: 000000000000062c R11: 0000000000000293 R12: 0000000000000002
[47851.367518] R13: 0000000000000000 R14: 00005647819af200 R15: 00007ffce0398f10
[47851.367618] INFO: task pip:19670 blocked for more than 483 seconds.
[47851.367660] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47851.367704] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.367753] pip D 0 19670 19606 0x00000120
[47851.367756] Call Trace:
[47851.367760] __schedule+0x2b9/0x6c0
[47851.367764] ? bit_wait_io+0x50/0x50
[47851.367766] schedule+0x42/0xb0
[47851.367769] bit_wait+0x11/0x50
[47851.367771] __wait_on_bit+0x33/0xa0
[47851.367775] __inode_wait_for_writeback+0xae/0xe0
[47851.367779] ? var_wake_function+0x30/0x30
[47851.367782] inode_wait_for_writeback+0x27/0x40
[47851.367785] evict+0xbd/0x1b0
[47851.367787] iput+0x141/0x1f0
[47851.367790] do_unlinkat+0x1c5/0x2d0
[47851.367793] __x64_sys_unlinkat+0x38/0x60
[47851.367795] do_syscall_64+0x5a/0x130
[47851.367799] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367801] RIP: 0033:0x7f7a4ac39d77
[47851.367804] Code: Bad RIP value.
[47851.367806] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47851.367808] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47851.367809] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47851.367811] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47851.367812] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47851.367814] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47851.367817] INFO: task sync:20009 blocked for more than 483 seconds.
[47851.367859] Tainted: P O 5.3.0-13-generic #14-Ubuntu
[47851.367902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.367952] sync D 0 20009 20008 0x00004120
[47851.367954] Call Trace:
[47851.367958] __schedule+0x2b9/0x6c0
[47851.367961] schedule+0x42/0xb0
[47851.367963] wb_wait_for_completion+0x64/0x90
[47851.367966] ? wait_woken+0x80/0x80
[47851.367968] sync_inodes_sb+0xd7/0x290
[47851.367972] sync_inodes_one_sb+0x15/0x20
[47851.367975] iterate_supers+0xa3/0x110
[47851.367977] ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47851.367980] ksys_sync+0x42/0xb0
[47851.367983] __ia32_sys_sync+0xe/0x20
[47851.367985] do_syscall_64+0x5a/0x130
[47851.367989] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367991] RIP: 0033:0x7f56f6bd7287
[47851.367994] Code: Bad RIP value.
[47851.367995] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47851.367998] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47851.367999] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47851.368000] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47851.368002] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47851.368003] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
I can grab any other debugging information that would be helpful, just let me know what commands to run to grab it.
EDIT: Oh, and I suspect it may have SOMETHING to do with one particular LXC container. The container is stored in a ZFS dataset. Inside the container, the only thing running is Nextcloud, installed from Snapcraft.
check smart info for all disks if some of those is about to fail, also watch out for scsi/disk errors in dmesg, maybe this is an error occuring after some disk hiccup. also make sure there is enough free space on both pools
Both pools are nowhere near full. SMART data on all disks is fine. If the problem was caused by the disks, I would expect to see high iowait. However, there is no iowait.
I would suggest trying v0.8.2 since there was a kworker/rollback deadlock fixed in that release. Your posted stacks don't confirm its the same issue but it sounds like it could be from the description. See #9203 and related issues for details on the deadlock.
Alright, I'll give that a try. I see #7038 was referenced there, and it looks to me like this may be the same issue as that.
I've updated the system to 0.8.2. I think I've pinpointed the thing that's triggering the problem (snapfuse in an LXC container spinning, doing lots of I/O, sucking up an entire core) since that seems to have popped up around the same time, but I'll forgo fixing that for a few days to see if I can confirm if 0.8.2 fixes the problem.
Alright. Prior to updating to 0.8.2, it took 12-24 hours for this issue to crop up, and it's been 2 days with no sign of it, so I'm inclined to believe that this is now fixed.
Thanks!
I've run into this with 0.8.2.
Actually, yep, looks like last night it occurred again. Based on my sample size of 1, 0.8.2 seems to have improved it, since it took 4 days to occur rather than less than one day, but it has occurred again.
The system is currently still responsive. Before it completely locks up and forces me to reboot, and before I just disable the VM that I am fairly sure is causing the problem, is there any debugging info I can gather to assist in determining the root cause of the problem?
Perhaps attach zpool status and zfs get all output?
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:01:18 with 0 errors on Fri Oct 11 01:09:10 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme0n1p4 ONLINE 0 0 0
nvme1n1p4 ONLINE 0 0 0
errors: No known data errors
lz4 compression is enabled, defaults otherwise.
This has just happened again:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
120819 root 20 0 0 0 0 R 100.0 0.0 9:46.79 kworker/u256:2+flush-zfs-1
While attempting to compile libcap.
[188621.756418] INFO: task emerge:9987 blocked for more than 614 seconds.
[188621.756419] Tainted: P OE 5.2.16-gentoo #4
[188621.756419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[188621.756420] emerge D 0 9987 1 0x00004004
[188621.756421] Call Trace:
[188621.756423] ? __schedule+0x250/0x660
[188621.756426] ? trace_hardirqs_on+0x31/0xd0
[188621.756427] schedule+0x39/0xa0
[188621.756428] wb_wait_for_completion+0x5e/0x90
[188621.756430] ? wait_woken+0x70/0x70
[188621.756431] sync_inodes_sb+0xd8/0x280
[188621.756433] sync_filesystem+0x5e/0x90
[188621.756434] __se_sys_syncfs+0x38/0x70
[188621.756435] do_syscall_64+0x5a/0x130
[188621.756436] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[188621.756437] RIP: 0033:0x7f6f9f5e1d97
[188621.756439] Code: Bad RIP value.
[188621.756439] RSP: 002b:00007ffd7c463e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000132
[188621.756440] RAX: ffffffffffffffda RBX: 00007ffd7c463e40 RCX: 00007f6f9f5e1d97
[188621.756441] RDX: 00007f6f9f4b0b80 RSI: 0000000073ffa500 RDI: 000000000000000d
[188621.756442] RBP: 00007ffd7c463e40 R08: 00007f6f9eb73cc8 R09: 000055643a62ef68
[188621.756442] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd7c463f88
[188621.756443] R13: 0000000000000000 R14: 00007ffd7c463ef0 R15: 00007ffd7c463f68
I'm tempted to go back to using git master, when I was on zfs-9999 just prior to 0.8.2 release I didn't see this (was using git build as 0.8.x didn't build against 5.2.x kernel at the time) and I did note that when I switched to stable 0.8.2, I had unsupported "future" features on my pool (I rebuilt it so I had 0.8.2 clean).
Might it be worth comparing some system configurations?
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:02:31 with 0 errors on Mon Oct 7 22:35:32 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
sda1 ONLINE 0 0 0
errors: No known data errors
pool: storage
state: ONLINE
scan: scrub repaired 0B in 0 days 00:55:30 with 0 errors on Mon Oct 7 23:35:51 2019
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000cca24300d8f4 ONLINE 0 0 0
scsi-35000cca243014464 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000cca243015924 ONLINE 0 0 0
scsi-35000cca24400a9a4 ONLINE 0 0 0
errors: No known data errors
The disks are connected to a generic LSI SAS2 HBA. The rpool is a single Crucial MX500 SSD connected to SAS port 0 on the HBA. The entire storage pool is encrypted. The disks in the storage pool are Hitachi Ultrastar 7K4000 SAS HDDs. They are connected to ports 4-7.
The system is a Dell PowerEdge R415, with 2x AMD Opteron(tm) Processor 4365 EE processors, and 80 GB of RAM (4x4GB + 4x16GB).
I've narrowed down what I believe to be the cause (a broken snap package inside of an LXC container; container is on storage pool. snapfuse is sucking up an entire core and reading from disk at a constant roughly 6 MB/sec), but I have avoided fixing the issue to both verify the issue was fixed (it's not), and to assist in diagnosing the root cause.
The next time this system completely locks up and I have to hard reboot it, I'm fixing the problem that I believe is triggering it, so if there are any commands I can run or logs I can fetch that would assist the ZFS developers in determining the root cause, I need to know what those are sooner, rather than later.
What brought you to the snap package? I ask as I have one on my entire system that as it happens, is active when the issue occurs. At this point I'm sure it's coincidence, buuut...
$ zpool status
pool: misc
state: ONLINE
scan: scrub repaired 0B in 0 days 00:01:06 with 0 errors on Fri Oct 11 01:09:02 2019
config:
NAME STATE READ WRITE CKSUM
misc ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-WDC_WD1001FALS-00J7B1_WD-WMATV2740378 ONLINE 0 0 0
ata-WDC_WD1001FALS-00J7B1_WD-WMATV2432116 ONLINE 0 0 0
ata-WDC_WD1001FALS-00J7B1_WD-WMATV2432663 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
ata-LITEONIT_LCS-256M6S_2.5_7mm_256GB-part1TW0XFJWX5508539D1600 ONLINE 0 0 0
ata-KINGSTON_SH103S3240G_50026B72390010E4-part1 ONLINE 0 0 0
cache
ata-LITEONIT_LCS-256M6S_2.5_7mm_256GB-part2TW0XFJWX5508539D1600 ONLINE 0 0 0
ata-KINGSTON_SH103S3240G_50026B72390010E4-part2 ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:01:18 with 0 errors on Fri Oct 11 01:09:10 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme1n1p4 ONLINE 0 0 0
nvme0n1p4 ONLINE 0 0 0
errors: No known data errors
System is a Threadripper 1950X, 64GB RAM, ASRock mobo. Samsung NVME disks in rpool, ten year old WD Black and a couple of SSDs in misc. Kernel 5.2.16.
It's rpool that's active and hangs when the issue occurs.
snapfuse
associated with that snap package (Nextcloud) has been sucking up an entire core and doing the constant I/O since an auto-update of the snap package in question occurred. The date of this auto update lines up very closely to the date the issue started appearing.
could you describe what snapfuse is/does? there seems not too much information around on the net... i don't find a manpage or a git repo...
snap packages are distributed as squashfs images. Normally, those are just mounted using the native kernel support as loopback devices, but in my case, I have snap running inside of an LXC container. You can't use loop devices inside of LXC. As a workaround for this, snap comes with snapfuse, which is an implementation of squashfs in userspace using FUSE, which is supported inside of LXC containers.
Hmm, fuse you say...
franz-5.3.3-x86_64.AppImage on /tmp/.mount_franz-gpQL6L type fuse.franz-5.3.3-x86_64.AppImage (ro,nosuid,nodev,relatime,user_id=1000,group_id=1000)
It looks to me like the only things actually hanging here are sync
commands. Is it the same for you? I think this is the cause of my load averages spiking like they do.
# ps axu | grep " D "
165536 302 0.0 0.0 6172 732 ? D 02:00 0:00 sync
165536 307 0.0 0.0 6172 736 ? D 10:15 0:00 sync
165536 647 0.0 0.0 6172 736 ? D Oct14 0:00 sync
165536 745 0.0 0.0 6172 628 ? D 00:20 0:00 sync
165536 760 0.0 0.0 6172 736 ? D Oct14 0:00 sync
[snip]
165536 32052 0.0 0.0 6172 736 ? D Oct14 0:00 sync
165536 32084 0.0 0.0 6172 624 ? D Oct13 0:00 sync
165536 32104 0.0 0.0 6172 732 ? D Oct13 0:00 sync
165536 32273 0.0 0.0 6172 732 ? D Oct14 0:00 sync
165536 32307 0.0 0.0 6172 624 ? D 03:35 0:00 sync
165536 32320 0.0 0.0 6172 728 ? D Oct13 0:00 sync
165536 32337 0.0 0.0 6172 628 ? D Oct14 0:00 sync
165536 32364 0.0 0.0 6172 628 ? D Oct13 0:00 sync
165536 32365 0.0 0.0 6172 732 ? D 05:20 0:00 sync
165536 32435 0.0 0.0 6172 732 ? D Oct14 0:00 sync
165536 32474 0.0 0.0 6172 628 ? D Oct14 0:00 sync
165536 32477 0.0 0.0 6172 628 ? D Oct13 0:00 sync
165536 32603 0.0 0.0 6172 628 ? D Oct14 0:00 sync
165536 32647 0.0 0.0 6172 624 ? D 08:30 0:00 sync
165536 32710 0.0 0.0 6172 628 ? D 05:10 0:00 sync
I'm going to replace sync
with a copy of true
on the container in question to see if this fixes my load average issues and makes this more manageable.
I'm unsure, I only saw emerge and gzip "hang" last time but forgot to grab a full process list before rebooting (needed system back ASAP). When it happens again I'll check.
Does it matter re: sync if it's just replaced with true or something when it comes to ZFS?
I'm not sure. It's certainly not optimal but I would assume it can't really get any worse than the process just hanging in uninterruptible sleep.
Hmm, fuse you say...
franz-5.3.3-x86_64.AppImage on /tmp/.mount_franz-gpQL6L type fuse.franz-5.3.3-x86_64.AppImage (ro,nosuid,nodev,relatime,user_id=1000,group_id=1000)
mind that franz has severe memory leaks
The system load averages will climb constantly
i think this is normal as with every hanging sync there is one more process waiting for I/O and every process in the run-queue waiting for completion will add up to the system load (it does not need to burn cpu). you can easily demonstrate this with stale nfs mounts
instead of rebooting please try stopping all lxc stuff and try to unmount all fuse-mounts, if that doesn't work normally , try -f (force)
then have a look if the sync processes go away or try to kill those
maybe the sync is hanging because of fuse...
When sync hangs in D state inside the container, you cannot stop the LXC container. Any attempt to do so will just freeze.
Anyway, I will say, 0.8.2 has certainly at the very least made this problem slightly better. On 0.8.1 it was consistently occurring on a daily basis. 0.8.2 it went about 4 days the first time, and I'm at almost 5 days since then now and it's still working fine. I'll update the issue when it next occurs.
maybe related ?
https://superuser.com/questions/1439108/lxc-start-stop-hangs-and-filesystem-sync-hangs
Nope.
By definition, when a task is in D state (uninterruptible sleep), absolutely nothing (short of a hard reboot) will end it. Because restarting an LXC container requires killing all processes, and you cannot kill a process in D state, you cannot restart an LXC container with any child processes in D state.
This has hit me again.
Anyone experiencing this on kernels <5.0.0?
I'm experiencing this on Linux storagelols 5.0.0-16-generic #17~18.04.1-Ubuntu SMP
, but no real way to debug.
We were hit by the problem on 4.19 looks like the problem accumulates deferred threads after some time. some threads can still continue
relevant thread shows after sysrq:
[6756902.750815] CPU: 12 PID: 37541 Comm: kworker/u49:4 Tainted: P W O 4.19.86-gentoo #1 [6756902.750816] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013 [6756902.750817] Workqueue: writeback wb_workfn (flush-zfs-30) [6756902.750818] RIP: 0010:dbuf_verify+0x158/0x990 [zfs] [6756902.750819] Code: 4d 85 e4 74 15 49 3b 5c 24 20 0f 85 86 01 00 00 4d 8b 64 24 28 4d 85 e4 75 eb 80 7b 50 00 75 0a 48 83 3b 00 0f 84 36 02 00 00 <4c> 8b 43 48 4d 85 c0 74 6f 48 8b 43 30 49 3b 45 68 0f 84 0e 03 00 [6756902.750819] RSP: 0018:ffffb11b55b57728 EFLAGS: 00000246 [6756902.750820] RAX: 0000000000000000 RBX: ffff9f3d298d93f8 RCX: 0000000000000000 [6756902.750821] RDX: 0000000000000001 RSI: ffffffffc0c688f8 RDI: ffff9f3aa7ea6448 [6756902.750821] RBP: ffffb11b55b57750 R08: 0000000001bec000 R09: 0000000001bec000 [6756902.750822] R10: ffff9f09229a86c0 R11: 00000000000011ce R12: 0000000000000000 [6756902.750822] R13: ffff9f3aa7ea5fa0 R14: 00000000000006fb R15: ffff9f3aa7ea5fa0 [6756902.750823] FS: 0000000000000000(0000) GS:ffff9f217f980000(0000) knlGS:0000000000000000 [6756902.750823] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6756902.750824] CR2: 00007fc31495601e CR3: 0000003e7240c005 CR4: 00000000001626e0 [6756902.750824] Call Trace: [6756902.750824] dbuf_hold_impl_arg+0x12e/0xb70 [zfs] [6756902.750825] ? spl_kmem_alloc+0xf0/0x170 [spl] [6756902.750825] dbuf_hold_impl+0x95/0xc0 [zfs] [6756902.750826] dbuf_hold+0x33/0x60 [zfs] [6756902.750826] dnode_hold_impl+0x17a/0x10c0 [zfs] [6756902.750827] ? dbuf_rele_and_unlock+0x336/0x800 [zfs] [6756902.750827] ? zfs_refcount_remove_many+0x215/0x290 [zfs] [6756902.750827] ? zfs_refcount_destroy_many+0x159/0x200 [zfs] [6756902.750828] dnode_hold+0x1b/0x20 [zfs] [6756902.750828] dmu_bonus_hold+0x36/0x90 [zfs] [6756902.750829] sa_buf_hold+0xe/0x10 [zfs] [6756902.750829] zfs_zget+0x10d/0x310 [zfs] [6756902.750829] zfs_get_data+0x7e/0x520 [zfs] [6756902.750830] zil_commit_impl+0x1235/0x1c90 [zfs] [6756902.750830] zil_commit+0x59/0x160 [zfs] [6756902.750830] zpl_writepages+0xec/0x180 [zfs] [6756902.750831] do_writepages+0x41/0xd0 [6756902.750831] ? tracing_record_taskinfo_skip+0x3f/0x50 [6756902.750832] ? tracing_record_taskinfo+0x13/0xb0 [6756902.750833] ? fprop_fraction_percpu+0x2f/0x80 [6756902.750834] __writeback_single_inode+0x4d/0x390 [6756902.750834] writeback_sb_inodes+0x23c/0x570 [6756902.750835] wb_writeback+0xff/0x320 [6756902.750835] wb_workfn+0xf3/0x450 [6756902.750836] ? _raw_spin_unlock_irq+0x13/0x20 [6756902.750836] ? finish_task_switch+0x83/0x290 [6756902.750836] process_one_work+0x18a/0x3a0 [6756902.750837] worker_thread+0x50/0x3b0 [6756902.750837] kthread+0x106/0x140 [6756902.750837] ? process_one_work+0x3a0/0x3a0 [6756902.750838] ? kthread_park+0x80/0x80 [6756902.750838] ret_from_fork+0x24/0x50
Looks like the kworker thread is spinning and reading metadata, while not committing to ZIL. ZIL commit count is slightly reduced since the event at 5/3 2:20 A.M. After a while more and more programs hang behind this kworker thread as one can see in the load graph.
The gap in the graphs is not correlated with this particular system. There was a reconfiguration of the monitoring database.