Lags can be noticed on some system commands that try to get information from the ZFS file system
System information
| Type | Version/Name |
|---|---|
| Distribution Name | ALL |
| Distribution Version | N/A |
| Kernel Version | 4.19 |
| Architecture | x86_64 |
| OpenZFS Version | 2.1.5/master |
Describe the problem you're observing
Sometimes lags can be noticed for some system commands which partially tries to get information from ZFS file system. For example executing cat /proc/mounts (datasets from Pool are mounted) hangs on read syscall:
strace cat /proc/mounts
...
Pool-0/bak_ds10029 /Pools/Pool-0/bak_ds10029 zfs rw,relatime,xattr,posixacl,casesensitive 0 0
Pool-0/bak_ds10031 /Pools/Pool-0/bak_ds10031 zfs rw,relatime,xattr,posixacl,casesensitive 0 0
...
) = 2309
read(3, "", 131072) = 0 - here it hangs even to 30 seconds
...
Stack trace for cat process shows that it hangs on getting information from ZFS:
cat /proc/`pidof cat`/stack
[<0>] cv_wait_common+0xeb/0x120 [spl]
[<0>] rrw_enter_read_impl+0x47/0x150 [zfs]
[<0>] zpl_show_devname+0x2f/0xf0 [zfs]
[<0>] show_vfsmnt+0x54/0x170
[<0>] seq_read+0x2bb/0x3e0
[<0>] vfs_read+0x90/0x130
[<0>] ksys_read+0x5a/0xe0
[<0>] do_syscall_64+0x56/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff
Some other commands also hang in a similar way:
time find /Pools/Pool-0/ -maxdepth 1
real 0m32.251s
In the meantime dumped stack for that process shows:
cat /proc/`pidof find`/stack
[<0>] cv_wait_common+0xeb/0x120 [spl]
[<0>] rrw_enter_read_impl+0x47/0x150 [zfs]
[<0>] zfs_getattr_fast+0x48/0x220 [zfs]
[<0>] zpl_getattr+0x5d/0x100 [zfs]
[<0>] vfs_statx+0x8b/0xe0
[<0>] __se_sys_newfstatat+0x36/0x70
[<0>] do_syscall_64+0x56/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff
df /Pools/Pool-0/ &
cat /proc/`pidof df`/stack
[<0>] cv_wait_common+0xeb/0x120 [spl]
[<0>] rrw_enter_read_impl+0x47/0x150 [zfs]
[<0>] zpl_show_devname+0x2f/0xf0 [zfs]
[<0>] show_vfsmnt+0x54/0x170
[<0>] seq_read+0x2bb/0x3e0
[<0>] vfs_read+0x90/0x130
[<0>] ksys_read+0x5a/0xe0
[<0>] do_syscall_64+0x56/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff
Of course, under heavy load it wouldn't be surprising, but the strange thing is that some other commands (e.g. when cat /proc/mounts is hung) are executed immediately:
time for files in /Pools/Pool-0/*; do echo $files; done
...
/Pools/Pool-0/bak_ds1008
/Pools/Pool-0/bak_ds1009
/Pools/Pool-0/ds1000
/Pools/Pool-0/ds1001
real 0m0.001s
I suppose that built in bash for..in... command does not get attr information (zpl_getattr) as find,
but even creating new directory (of course when cat /proc/mounts is hung) on Pool can be done immediately:
time mkdir /Pools/Pool-0/testdir
real 0m0.018s
so looks like getting information from ZFS file system in some cases are artificially locked (maybe by send/receive).
Are you able to advise how to debug this behavior deeper?
Describe how to reproduce the problem
One zpool containing:
40x source datasets
40x destination datasets
40x backup tasks from source datasets (send) to destination datasets (receive) on the same node
Backup tasks configuration:
source retention: 1h every 10min, 6h every 1h
destination retention: 1w every 6h
Test procedure:
mount all datasets via SMB on Windows Server 2022,
copy some files to one dataset,
copy this files from this dataset to this dataset (Samba offload)
copy this files from this dataset to other datasets (Samba offload)
Include any warning/errors/backtraces from the system logs
zpl_show_devname() acquires the teardown lock for the mountpoint, is something else also holding that lock when you're trying to cat /proc/mounts?
zpl_show_devname()acquires the teardown lock for the mountpoint, is something else also holding that lock when you're trying tocat /proc/mounts?
if cat /proc/mounts hangs, stack dump shows each time the same trace:
cat /proc/`pidof cat`/stack
[<0>] cv_wait_common+0xeb/0x120 [spl]
[<0>] rrw_enter_read_impl+0x47/0x150 [zfs]
[<0>] zpl_show_devname+0x2f/0xf0 [zfs]
[<0>] show_vfsmnt+0x54/0x170
[<0>] seq_read+0x2bb/0x3e0
[<0>] vfs_read+0x90/0x130
[<0>] ksys_read+0x5a/0xe0
[<0>] do_syscall_64+0x56/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff
Right, the question is what other process is holding that lock, preventing cat from being able to get it.
Right, the question is what other process is holding that lock, preventing cat from being able to get it.
In the system, ZnapZend performs backups of data from the datasets with frequent retention using the send / receive and snapshots to the same node so I suppose that something from send / receive or snapshots functionality holding the lock.
By the way do you mean about ZPL_ENTER(zfsvfs) in zpl_show_devname ? If yes, I noticed that ZPL_ENTER(zfsvfs) is used in some other functions e.g. zpl_snapdir_iterate , zpl_xattr_list so maybe somewhere here is holding the lock?
I am also seeing process lags that occasionally generate a timeouts in zpl_show_devname on a Rocky Linux 8 x86_64 system running ZFS 2.1.6. This is happening while running concurrent zfs recv processes managed by syncoid, e.g.,
[root@cascade1 ~]# grep -B2 -A15 "Call Trace:" /var/log/messages Nov 1 16:55:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 1 16:55:23 cascade1.ldas.cit kernel: gmond D ffff9f287a2505e0 0 7525 1 0x00000080 Nov 1 16:55:23 cascade1.ldas.cit kernel: Call Trace: Nov 1 16:55:23 cascade1.ldas.cit kernel: [] schedule+0x29/0x70 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 1 16:55:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 1 16:58:17 cascade1.ldas.cit systemd-logind: Removed session 28771. Nov 1 16:59:15 cascade1.ldas.cit systemd-logind: New session 28773 of user root. -- Nov 2 07:41:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:41:23 cascade1.ldas.cit kernel: automount D ffff9fe9d0a31660 0 299424 1 0x00000080 Nov 2 07:41:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:41:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task automount:299424 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: automount D ffff9fe9d0a31660 0 299424 1 0x00000080 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task automount:242155 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: automount D ffffa00e4a271660 0 242155 1 0x00000080 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task automount:242156 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: automount D ffff9ffef8949660 0 242156 1 0x00000080 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task automount:242157 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: automount D ffff9ffef894a6e0 0 242157 1 0x00000080 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task automount:242158 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: automount D ffff9ffef894b760 0 242158 1 0x00000080 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task gmond:7525 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: gmond D ffff9f287a2505e0 0 7525 1 0x00000080 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:43:23 cascade1.ldas.cit kernel: INFO: task df:55081 blocked for more than 120 seconds. Nov 2 07:43:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:43:23 cascade1.ldas.cit kernel: df D ffffa001183547e0 0 55081 1 0x00000084 Nov 2 07:43:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] show_mountinfo+0x1d6/0x2d0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:43:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:45:00 cascade1.ldas.cit systemd-logind: Removed session 28910. Nov 2 07:47:23 cascade1.ldas.cit kernel: INFO: task gmond:7525 blocked for more than 120 seconds. Nov 2 07:47:23 cascade1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 2 07:47:23 cascade1.ldas.cit kernel: gmond D ffff9f287a2505e0 0 7525 1 0x00000080 Nov 2 07:47:23 cascade1.ldas.cit kernel: Call Trace: Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] schedule+0x29/0x70 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] cv_wait_common+0x125/0x150 [spl] Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] ? wake_up_atomic_t+0x30/0x30 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] __cv_wait+0x15/0x20 [spl] Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] rrw_enter_read_impl+0x53/0x170 [zfs] Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] rrm_enter_read+0x3f/0x50 [zfs] Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] zpl_show_devname+0x34/0x100 [zfs] Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] show_vfsmnt+0x4e/0x180 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] m_show+0x19/0x20 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] seq_read+0x2e4/0x450 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] vfs_read+0x9f/0x170 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] SyS_read+0x55/0xd0 Nov 2 07:47:23 cascade1.ldas.cit kernel: [ ] system_call_fastpath+0x25/0x2a Nov 2 07:50:01 cascade1.ldas.cit systemd: Started Session 28916 of user root. Nov 2 08:00:01 cascade1.ldas.cit systemd: Started Session 28917 of user root.
I would like to continue this thread as if an additional scrub/resilver is added to the test described in the first post, sometimes you can notice even greater lags.
I started collecting stats every 5 seconds and under heavy load single txg are held for a very long time:
2023 May 17 03:55:35 [user.notice] get_txg_id: 2438088, rw_count: 406077118
2023 May 17 03:55:39 [user.notice] get_txg_id: 2438089, rw_count: 406078036
2023 May 17 03:55:45 [user.notice] get_txg_id: 2438089, rw_count: 406081327
2023 May 17 03:55:49 [user.notice] get_txg_id: 2438089, rw_count: 406083191
2023 May 17 03:55:55 [user.notice] get_txg_id: 2438089, rw_count: 406083375
2023 May 17 03:55:59 [user.notice] get_txg_id: 2438089, rw_count: 406083392
2023 May 17 03:56:06 [user.notice] get_txg_id: 2438089, rw_count: 406083392
2023 May 17 03:56:10 [user.notice] get_txg_id: 2438089, rw_count: 406083392
...
2023 May 17 03:57:46 [user.notice] get_txg_id: 2438089, rw_count: 406083392
2023 May 17 03:57:50 [user.notice] get_txg_id: 2438089, rw_count: 406083392
2023 May 17 03:57:56 [user.notice] get_txg_id: 2438090, rw_count: 406083473
get_txg_id and rw_count are collected using:
get_txg_id=$(tail -n 1 /proc/spl/kstat/zfs/<pool_name>/txgs | awk '{print $1}')
rw_count=$(grep -E 'reads|writes' /proc/spl/kstat/zfs/<pool_name>/objset* | awk '{sum += $NF} END {print sum}')
During this time the execution of cat /proc/mounts or touch /Pool-0/testfile hangs:
cat /proc/`pidof touch`/stack
[<0>] dmu_tx_wait+0x363/0x370 [zfs]
[<0>] dmu_tx_assign+0x9c/0x480 [zfs]
[<0>] zfs_setattr+0x68d/0x22c0 [zfs]
[<0>] zpl_setattr+0x108/0x180 [zfs]
[<0>] notify_change+0x302/0x410
[<0>] utimes_common+0xd7/0x1b0
[<0>] do_utimes+0xd8/0x150
[<0>] __se_sys_utimensat+0x84/0xd0
[<0>] do_syscall_64+0x56/0x140
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff
Of course, under heavy load this wouldn't be surprising, but it's odd that some other commands (e.g. when cat /proc/mounts or touch /Pools/Pool-0/testfile is hang) in the system are executed immediately. ZFS itself does "something" internally that even its I/O stats are not updated.
Can you advise how to check what ZFS is doing during such lag?
I am able regularly trigger this on an RL9.2 system with ZFS 2.1.12 if a new syncoid job runs zfs rollback. And once this blocks read access to /proc/self/mountstats that indirectly blocks new SSH and console login sessions. For example, here is a random background process timing out,
Jun 20 17:42:40 zfs10 kernel: INFO: task gmond:17697 blocked for more than 122 seconds. Jun 20 17:42:40 zfs10 kernel: Tainted: P OE -------- --- 5.14.0-284.11.1.el9_2.x86_64 #1 Jun 20 17:42:40 zfs10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 20 17:42:40 zfs10 kernel: task:gmond state:D stack: 0 pid:17697 ppid: 1 flags:0x00000002 Jun 20 17:42:40 zfs10 kernel: Call Trace: Jun 20 17:42:40 zfs10 kernel:Jun 20 17:42:40 zfs10 kernel: __schedule+0x248/0x620 Jun 20 17:42:40 zfs10 kernel: ? vsnprintf+0x267/0x610 Jun 20 17:42:40 zfs10 kernel: schedule+0x5a/0xc0 Jun 20 17:42:40 zfs10 kernel: cv_wait_common+0xfa/0x130 [spl] Jun 20 17:42:40 zfs10 kernel: ? cpuacct_percpu_seq_show+0x10/0x10 Jun 20 17:42:40 zfs10 kernel: rrw_enter_read_impl+0x4c/0x170 [zfs] Jun 20 17:42:40 zfs10 kernel: zpl_show_devname+0x2f/0x100 [zfs] Jun 20 17:42:40 zfs10 kernel: show_vfsmnt+0x54/0x1e0 Jun 20 17:42:40 zfs10 kernel: seq_read_iter+0x2c6/0x4b0 Jun 20 17:42:40 zfs10 kernel: new_sync_read+0xfc/0x180 Jun 20 17:42:40 zfs10 kernel: vfs_read+0x13c/0x190 Jun 20 17:42:40 zfs10 kernel: ksys_read+0x5f/0xe0 Jun 20 17:42:40 zfs10 kernel: do_syscall_64+0x5c/0x90 Jun 20 17:42:40 zfs10 kernel: ? syscall_exit_to_user_mode+0x12/0x30 Jun 20 17:42:40 zfs10 kernel: ? do_syscall_64+0x69/0x90 Jun 20 17:42:40 zfs10 kernel: ? vfs_read+0x13c/0x190 Jun 20 17:42:40 zfs10 kernel: ? syscall_exit_work+0x11a/0x150 Jun 20 17:42:40 zfs10 kernel: ? syscall_exit_to_user_mode+0x12/0x30 Jun 20 17:42:40 zfs10 kernel: ? do_syscall_64+0x69/0x90 Jun 20 17:42:40 zfs10 kernel: ? sysvec_apic_timer_interrupt+0x3c/0x90 Jun 20 17:42:40 zfs10 kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd Jun 20 17:42:40 zfs10 kernel: RIP: 0033:0x7f558ab3eb2c Jun 20 17:42:40 zfs10 kernel: RSP: 002b:00007ffcab562330 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Jun 20 17:42:40 zfs10 kernel: RAX: ffffffffffffffda RBX: 0000560c8d4f22a0 RCX: 00007f558ab3eb2c Jun 20 17:42:40 zfs10 kernel: RDX: 0000000000000400 RSI: 0000560c8d560a50 RDI: 000000000000000b Jun 20 17:42:40 zfs10 kernel: RBP: 00007f558abf75e0 R08: 0000000000000000 R09: 0000000000000000 Jun 20 17:42:40 zfs10 kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 0000560c8d4f22a0 Jun 20 17:42:40 zfs10 kernel: R13: 0000000000000d68 R14: 00007f558abf69e0 R15: 0000000000000d68 Jun 20 17:42:40 zfs10 kernel:
Note, if I have a pre-exsting terminal session open before the lockup I can kill the zfs rollback, wait ~10 minutes, then the system is back to normal. And the zpool appears to be otherwise healthy while this is occurring, e.g., other zfs receive jobs continue to receive data.
Note, while I am waiting for an aborted zfs command to finish exiting, i.e., while /proc//bin/top reports up to 99% CPU time for several minutes.
Note, while I am waiting for an aborted zfs command to finish exiting, i.e., while /proc//bin/top reports up to 99% CPU time for several minutes.
Something is very strange. Even before I kill the zfs rollback process it doesn't have a stack trace but is accumulating CPU time according to the RL9.2 kernel???
[root@zfs10 ~]# ps -f -p 1105522 UID PID PPID C STIME TTY TIME CMD root 1105522 1103700 99 13:56 ? 00:01:30 zfs rollback -R home5/shahrearkhan.faisal@autosnap_2023-06-20_18:09 [root@zfs10 ~]# cat /proc/1105522/stack [root@zfs10 ~]# ps -f -p 1105522 UID PID PPID C STIME TTY TIME CMD root 1105522 1103700 98 13:56 ? 00:01:38 zfs rollback -R home5/shahrearkhan.faisal@autosnap_2023-06-20_18:09
Something is very strange. Even before I kill the zfs rollback process it doesn't have a stack trace but is accumulating CPU time according to the RL9.2 kernel???
[root@zfs10 ~]# ps -f -p 1105522 UID PID PPID C STIME TTY TIME CMD root 1105522 1103700 99 13:56 ? 00:01:30 zfs rollback -R home5/shahrearkhan.faisal@autosnap_2023-06-20_18:09 [root@zfs10 ~]# cat /proc/1105522/stack [root@zfs10 ~]# ps -f -p 1105522 UID PID PPID C STIME TTY TIME CMD root 1105522 1103700 98 13:56 ? 00:01:38 zfs rollback -R home5/shahrearkhan.faisal@autosnap_2023-06-20_18:09