[Bug] - multiple hung tasks calling fadvise on kernel 6.12
Describe the bug
Our application runs on EKS 1.33 with latest AL2023 x86_64 nodes. It runs a workload that attaches several EBS volumes on the fly, mounts them, runs a certain logic, and then umounts and detaches the volume. Recently, EKS team has upgraded their node images to use the kernel 6.12 branch of AL2023. We started noticing quite a lot more containers simply hung for a few hours. Running dmesg on the node itself would usually show something similar as in the attached dmesg log. In the attached log we can see that there are 3 volumes, all ext4, and 4 hung tasks, all waiting on the fadvise mutex - fluentbit (flb-pipeline), a rust-based security agent (tokio-runtime-w), and our application (entrypoint). This happens rather fast, as can be seen, the first dmesg entry starts at the 240 seconds marker (~4 minutes after the node is started).
Our application and the security agent both call fadvise with POSIX_FADV_DONTNEED for the same reason, they both open files and read them exactly once. The goal is not to fill the page cache with unnecessary data.
I'd like to mention that this behavior did NOT occur with the kernel 6.1 version of EKS nodes, and very quickly started to show afterwards, that's why that's my initial suspicion.
Thanks in advance.
Expected behavior No hung tasks due to fadvise calls.
dmesg logs
[ 90.984422] nvme nvme2: pci function 0000:00:1f.0
[ 90.985027] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
[ 90.987825] ACPI: \_SB_.LNKC: Enabled at IRQ 11
[ 90.996176] nvme nvme2: 2/0/0 default/read/poll queues
[ 91.131043] nvme2n1: p1 p14 p15
[ 94.727028] EXT4-fs (nvme1n1p1): 61 orphan inodes deleted
[ 94.727722] EXT4-fs (nvme1n1p1): recovery complete
[ 94.738069] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[ 94.928915] EXT4-fs (nvme2n1p1): 56 orphan inodes deleted
[ 94.928984] EXT4-fs (nvme1n1p1): unmounting filesystem a7e321c6-fe24-4b08-b922-b296032b6eda.
[ 94.929609] EXT4-fs (nvme2n1p1): recovery complete
[ 94.942794] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 94.944075] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[ 95.098641] EXT4-fs (nvme2n1p1): unmounting filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace.
[ 95.112313] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.140284] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[ 95.152469] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.191207] EXT4-fs (nvme1n1p1): unmounting filesystem a7e321c6-fe24-4b08-b922-b296032b6eda.
[ 95.201936] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.230268] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[ 95.240912] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.306483] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[ 95.317214] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.352621] EXT4-fs (nvme2n1p1): unmounting filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace.
[ 95.363757] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 95.390572] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[ 95.454134] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ 148.603898] sh (3589): drop_caches: 2
[ 208.768362] sh (3589): drop_caches: 2
[ 243.582122] INFO: task flb-pipeline:3818 blocked for more than 122 seconds.
[ 243.583001] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.583905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.584854] task:flb-pipeline state:D stack:0 pid:3818 tgid:3787 ppid:3157 flags:0x00000002
[ 243.586823] Call Trace:
[ 243.587170] <TASK>
[ 243.587477] __schedule+0x267/0x530
[ 243.587947] schedule+0x23/0xa0
[ 243.588358] schedule_preempt_disabled+0x11/0x20
[ 243.588940] __mutex_lock.constprop.0+0x31d/0x650
[ 243.589529] __lru_add_drain_all+0x32/0x1f0
[ 243.590061] generic_fadvise+0x267/0x2b0
[ 243.590562] ksys_fadvise64_64+0x4c/0xa0
[ 243.591057] __x64_sys_fadvise64+0x18/0x20
[ 243.591571] do_syscall_64+0x5b/0x170
[ 243.592079] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.592709] RIP: 0033:0x7fe995cf869e
[ 243.593177] RSP: 002b:00007fe9951fa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.594104] RAX: ffffffffffffffda RBX: 00007fe9942f2540 RCX: 00007fe995cf869e
[ 243.594972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000024
[ 243.595849] RBP: 0000000000007fff R08: 0000000000000000 R09: 0000000000000000
[ 243.596717] R10: 0000000000000004 R11: 0000000000000246 R12: 00007fe994242900
[ 243.597585] R13: 00007fe995698000 R14: 0000000000000001 R15: 00007fe99562a000
[ 243.598458] </TASK>
[ 243.598770] INFO: task tokio-runtime-w:4534 blocked for more than 122 seconds.
[ 243.599647] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.600541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.601475] task:tokio-runtime-w state:D stack:0 pid:4534 tgid:4495 ppid:3677 flags:0x00000002
[ 243.602613] Call Trace:
[ 243.602940] <TASK>
[ 243.603230] __schedule+0x267/0x530
[ 243.603684] schedule+0x23/0xa0
[ 243.604097] schedule_preempt_disabled+0x11/0x20
[ 243.604672] __mutex_lock.constprop.0+0x31d/0x650
[ 243.605263] __lru_add_drain_all+0x32/0x1f0
[ 243.605791] generic_fadvise+0x267/0x2b0
[ 243.606323] ksys_fadvise64_64+0x4c/0xa0
[ 243.606817] __x64_sys_fadvise64+0x18/0x20
[ 243.607331] do_syscall_64+0x5b/0x170
[ 243.607798] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.608425] RIP: 0033:0x7fc0e1f4c9da
[ 243.608889] RSP: 002b:00007fc0dbdfd998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.609803] RAX: ffffffffffffffda RBX: 00007fc0dbdfdb10 RCX: 00007fc0e1f4c9da
[ 243.610676] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
[ 243.611544] RBP: 00007fc0dbdfd9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 243.612494] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 243.613640] R13: 0000000000000028 R14: 0000000000000001 R15: 0000000000000040
[ 243.617328] </TASK>
[ 243.617732] INFO: task tokio-runtime-w:4558 blocked for more than 122 seconds.
[ 243.618838] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.619889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.622887] task:tokio-runtime-w state:D stack:0 pid:4558 tgid:4495 ppid:3677 flags:0x00000002
[ 243.624316] Call Trace:
[ 243.624735] <TASK>
[ 243.625108] __schedule+0x267/0x530
[ 243.625661] schedule+0x23/0xa0
[ 243.627058] schedule_preempt_disabled+0x11/0x20
[ 243.627791] __mutex_lock.constprop.0+0x31d/0x650
[ 243.628530] __lru_add_drain_all+0x32/0x1f0
[ 243.630824] generic_fadvise+0x267/0x2b0
[ 243.631462] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 243.632447] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 243.633277] ovl_fadvise+0x97/0xc0 [overlay]
[ 243.633988] ksys_fadvise64_64+0x4c/0xa0
[ 243.634658] __x64_sys_fadvise64+0x18/0x20
[ 243.635312] do_syscall_64+0x5b/0x170
[ 243.635906] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.636690] RIP: 0033:0x7fc0e1f4c9da
[ 243.637276] RSP: 002b:00007fc0d24ee998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.638429] RAX: ffffffffffffffda RBX: 00007fc0d24eeb10 RCX: 00007fc0e1f4c9da
[ 243.639491] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000214
[ 243.640557] RBP: 00007fc0d24ee9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 243.641644] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 243.642787] R13: 0000000000000028 R14: 0000000000000004 R15: 0000000000000040
[ 243.643846] </TASK>
[ 243.644223] INFO: task tokio-runtime-w:4560 blocked for more than 122 seconds.
[ 243.645324] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 243.646432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.647594] task:tokio-runtime-w state:D stack:0 pid:4560 tgid:4495 ppid:3677 flags:0x00000002
[ 243.648987] Call Trace:
[ 243.649395] <TASK>
[ 243.649759] __schedule+0x267/0x530
[ 243.650341] schedule+0x23/0xa0
[ 243.650848] schedule_preempt_disabled+0x11/0x20
[ 243.651567] __mutex_lock.constprop.0+0x31d/0x650
[ 243.652356] __lru_add_drain_all+0x32/0x1f0
[ 243.653029] generic_fadvise+0x267/0x2b0
[ 243.653623] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 243.654411] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 243.655222] ovl_fadvise+0x97/0xc0 [overlay]
[ 243.655904] ksys_fadvise64_64+0x4c/0xa0
[ 243.656529] __x64_sys_fadvise64+0x18/0x20
[ 243.657165] do_syscall_64+0x5b/0x170
[ 243.657742] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 243.658514] RIP: 0033:0x7fc0e1f4c9da
[ 243.659097] RSP: 002b:00007fc0d20ec998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 243.660211] RAX: ffffffffffffffda RBX: 00007fc0d20ecb10 RCX: 00007fc0e1f4c9da
[ 243.661284] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000207
[ 243.662407] RBP: 00007fc0d20ec9a0 R08: 00000000000000c8 R09: 0000000000000006
[ 243.663463] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 243.664533] R13: 0000000000000028 R14: 0000000000000003 R15: 0000000000000040
[ 243.665623] </TASK>
[ 268.840461] sh (3589): drop_caches: 2
[ 328.982835] sh (3589): drop_caches: 2
[ 366.462099] INFO: task flb-pipeline:3818 blocked for more than 245 seconds.
[ 366.462968] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.463856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.464786] task:flb-pipeline state:D stack:0 pid:3818 tgid:3787 ppid:3157 flags:0x00000002
[ 366.465886] Call Trace:
[ 366.466217] <TASK>
[ 366.466505] __schedule+0x267/0x530
[ 366.466949] schedule+0x23/0xa0
[ 366.467490] schedule_preempt_disabled+0x11/0x20
[ 366.468054] __mutex_lock.constprop.0+0x31d/0x650
[ 366.468633] __lru_add_drain_all+0x32/0x1f0
[ 366.469145] generic_fadvise+0x267/0x2b0
[ 366.469631] ksys_fadvise64_64+0x4c/0xa0
[ 366.470131] __x64_sys_fadvise64+0x18/0x20
[ 366.470633] do_syscall_64+0x5b/0x170
[ 366.471111] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.471733] RIP: 0033:0x7fe995cf869e
[ 366.472246] RSP: 002b:00007fe9951fa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.473139] RAX: ffffffffffffffda RBX: 00007fe9942f2540 RCX: 00007fe995cf869e
[ 366.473989] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000024
[ 366.474846] RBP: 0000000000007fff R08: 0000000000000000 R09: 0000000000000000
[ 366.475701] R10: 0000000000000004 R11: 0000000000000246 R12: 00007fe994242900
[ 366.476552] R13: 00007fe995698000 R14: 0000000000000001 R15: 00007fe99562a000
[ 366.477405] </TASK>
[ 366.477711] INFO: task tokio-runtime-w:4534 blocked for more than 245 seconds.
[ 366.478576] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.479451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.480368] task:tokio-runtime-w state:D stack:0 pid:4534 tgid:4495 ppid:3677 flags:0x00000002
[ 366.481462] Call Trace:
[ 366.481782] <TASK>
[ 366.482113] __schedule+0x267/0x530
[ 366.482553] schedule+0x23/0xa0
[ 366.482955] schedule_preempt_disabled+0x11/0x20
[ 366.483518] __mutex_lock.constprop.0+0x31d/0x650
[ 366.484094] __lru_add_drain_all+0x32/0x1f0
[ 366.484605] generic_fadvise+0x267/0x2b0
[ 366.485096] ksys_fadvise64_64+0x4c/0xa0
[ 366.485584] __x64_sys_fadvise64+0x18/0x20
[ 366.486095] do_syscall_64+0x5b/0x170
[ 366.486554] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.487171] RIP: 0033:0x7fc0e1f4c9da
[ 366.487624] RSP: 002b:00007fc0dbdfd998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.488530] RAX: ffffffffffffffda RBX: 00007fc0dbdfdb10 RCX: 00007fc0e1f4c9da
[ 366.489382] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
[ 366.490236] RBP: 00007fc0dbdfd9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 366.491088] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 366.491968] R13: 0000000000000028 R14: 0000000000000001 R15: 0000000000000040
[ 366.492821] </TASK>
[ 366.493117] INFO: task tokio-runtime-w:4558 blocked for more than 245 seconds.
[ 366.493979] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.494862] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.495789] task:tokio-runtime-w state:D stack:0 pid:4558 tgid:4495 ppid:3677 flags:0x00000002
[ 366.496878] Call Trace:
[ 366.497198] <TASK>
[ 366.497480] __schedule+0x267/0x530
[ 366.497921] schedule+0x23/0xa0
[ 366.498328] schedule_preempt_disabled+0x11/0x20
[ 366.498901] __mutex_lock.constprop.0+0x31d/0x650
[ 366.499481] __lru_add_drain_all+0x32/0x1f0
[ 366.499995] generic_fadvise+0x267/0x2b0
[ 366.500482] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 366.501097] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 366.501737] ovl_fadvise+0x97/0xc0 [overlay]
[ 366.502313] ksys_fadvise64_64+0x4c/0xa0
[ 366.502799] __x64_sys_fadvise64+0x18/0x20
[ 366.503304] do_syscall_64+0x5b/0x170
[ 366.503775] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.504390] RIP: 0033:0x7fc0e1f4c9da
[ 366.504842] RSP: 002b:00007fc0d24ee998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.505734] RAX: ffffffffffffffda RBX: 00007fc0d24eeb10 RCX: 00007fc0e1f4c9da
[ 366.506602] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000214
[ 366.507455] RBP: 00007fc0d24ee9a0 R08: 00000000000000c8 R09: 0000000000000004
[ 366.508308] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 366.509163] R13: 0000000000000028 R14: 0000000000000004 R15: 0000000000000040
[ 366.510020] </TASK>
[ 366.510314] INFO: task tokio-runtime-w:4560 blocked for more than 245 seconds.
[ 366.511175] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.512087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.513015] task:tokio-runtime-w state:D stack:0 pid:4560 tgid:4495 ppid:3677 flags:0x00000002
[ 366.514107] Call Trace:
[ 366.514427] <TASK>
[ 366.514708] __schedule+0x267/0x530
[ 366.515151] schedule+0x23/0xa0
[ 366.515555] schedule_preempt_disabled+0x11/0x20
[ 366.516124] __mutex_lock.constprop.0+0x31d/0x650
[ 366.516705] __lru_add_drain_all+0x32/0x1f0
[ 366.517221] generic_fadvise+0x267/0x2b0
[ 366.517710] ? ovl_path_realdata+0x76/0x90 [overlay]
[ 366.518324] ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[ 366.518958] ovl_fadvise+0x97/0xc0 [overlay]
[ 366.519497] ksys_fadvise64_64+0x4c/0xa0
[ 366.519984] __x64_sys_fadvise64+0x18/0x20
[ 366.520491] do_syscall_64+0x5b/0x170
[ 366.520954] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.521569] RIP: 0033:0x7fc0e1f4c9da
[ 366.522098] RSP: 002b:00007fc0d20ec998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[ 366.522997] RAX: ffffffffffffffda RBX: 00007fc0d20ecb10 RCX: 00007fc0e1f4c9da
[ 366.523844] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000207
[ 366.524693] RBP: 00007fc0d20ec9a0 R08: 00000000000000c8 R09: 0000000000000006
[ 366.525547] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[ 366.526403] R13: 0000000000000028 R14: 0000000000000003 R15: 0000000000000040
[ 366.527257] </TASK>
[ 366.527555] INFO: task entrypoint:5601 blocked for more than 122 seconds.
[ 366.528361] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.529240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.530168] task:entrypoint state:D stack:0 pid:5601 tgid:5565 ppid:4954 flags:0x00000002
[ 366.531258] Call Trace:
[ 366.531578] <TASK>
[ 366.531862] __schedule+0x267/0x530
[ 366.532335] schedule+0x23/0xa0
[ 366.532737] schedule_preempt_disabled+0x11/0x20
[ 366.533306] __mutex_lock.constprop.0+0x31d/0x650
[ 366.533886] ? __pfx_lru_add+0x10/0x10
[ 366.534358] __lru_add_drain_all+0x32/0x1f0
[ 366.534872] generic_fadvise+0x267/0x2b0
[ 366.535359] ksys_fadvise64_64+0x88/0xa0
[ 366.535852] __x64_sys_fadvise64+0x18/0x20
[ 366.536358] do_syscall_64+0x5b/0x170
[ 366.536817] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.537429] RIP: 0033:0x4ca53ae
[ 366.537832] RSP: 002b:000000c0138255a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[ 366.538731] RAX: ffffffffffffffda RBX: 0000000000000059 RCX: 0000000004ca53ae
[ 366.539583] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000059
[ 366.540435] RBP: 000000c0138255e8 R08: 0000000000000000 R09: 0000000000000000
[ 366.541286] R10: 0000000000000004 R11: 0000000000000202 R12: 000000c0138257f0
[ 366.542168] R13: 000000c00f9b0840 R14: 000000c001ab9c00 R15: 0000000000000000
[ 366.543019] </TASK>
[ 366.543315] INFO: task entrypoint:5621 blocked for more than 122 seconds.
[ 366.544126] Tainted: P OE 6.12.37-61.105.amzn2023.x86_64 #1
[ 366.545007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 366.545934] task:entrypoint state:D stack:0 pid:5621 tgid:5565 ppid:4954 flags:0x00000002
[ 366.547049] Call Trace:
[ 366.547371] <TASK>
[ 366.547654] __schedule+0x267/0x530
[ 366.548096] schedule+0x23/0xa0
[ 366.548501] schedule_preempt_disabled+0x11/0x20
[ 366.549071] __mutex_lock.constprop.0+0x31d/0x650
[ 366.549649] __lru_add_drain_all+0x32/0x1f0
[ 366.550171] generic_fadvise+0x267/0x2b0
[ 366.550659] ? do_futex+0xbe/0x1d0
[ 366.551092] ksys_fadvise64_64+0x88/0xa0
[ 366.551575] __x64_sys_fadvise64+0x18/0x20
[ 366.552100] do_syscall_64+0x5b/0x170
[ 366.552560] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 366.553175] RIP: 0033:0x4ca53ae
[ 366.553577] RSP: 002b:000000c013c5d5a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[ 366.554489] RAX: ffffffffffffffda RBX: 0000000000000234 RCX: 0000000004ca53ae
[ 366.555341] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000234
[ 366.556195] RBP: 000000c013c5d5e8 R08: 0000000000000000 R09: 0000000000000000
[ 366.557049] R10: 0000000000000004 R11: 0000000000000202 R12: 000000c013c5d7f0
[ 366.557901] R13: ffffffffffffffff R14: 000000c001ab8e00 R15: 0000000000000000
[ 366.558760] </TASK>
[ 366.559053] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
Thanks for the report, I'm forwarding this to the kernel team.
Hi @gmwiz,
Thank you for reporting this issue, The answers of below questions will help us narrowing down the root cause of the behavior you are reporting.
Can you elaborate what's the al2023 6.1 kernel version that you were using before hitting this issue? You also mentioned that the issue started after an EKS upgrade which usually updates the kernel and other software packages so we are wondering if you are seeing the same issue when updating the kernel only without updating other packages? Is there any reproduction steps that we can follow to trigger the same behavior? The shared logs and can see errors pointing to unclean unmounting for volumes so are you seeing the same errors on al2023 6.1? Can you elaborate if your application and the security agent are working on the same set of files? Are these files stored on the volumes that you previously unmounted?
The shared stack traces are pointing to a contention on mutex lock in the __lru_add_drain_all function that's responsible for going through all online CPUs and drain per cpu pages for each CPU. It seems that there's a task that acquire that mutex lock hence other threads that call fadvise with POSIX_FADV_DONTNEED will get stuck as that apparently require the same lock to be acquired. It's not clear from the stack traces which thread is actually acquiring that lock.
Hi @gmwiz,
Kind reminder that we are pending your reply to move forward with this issue.
Hi @abuehaze14 I'm sorry I missed your reply entirely!! Let me clarify - I did not upgrade EKS k8s version, just the worker node AMI images. We constantly upgrade to the latest version of EKS AMI images. The last version that did not have this issue is https://github.com/awslabs/amazon-eks-ami/releases/tag/v20250807 which contains kernel version 6.1.144-170.251.amzn2023. Yes, the uncleanly mounted volumes is something we always had in this cluster, given these are volumes created from an automated snapshot of a live machine. The security agent inspects all files in the node, that includes files from the unclean ext4 volumes. The application itself only goes over the files from the external volumes. So yes, they both go over files from the attached external volumes. Indeed that's exactly what I figured as well myself. I did get a chance to pull a full memory dump from the affected node before killing it, using LiME. Is there a way I could share it with you for further inspection? Perhaps via official AWS support?
Here's some more information I gathered analyzing the memory dump I acquired. In two different consecutive dumps I got, it seems like the mutex has no owner but is solely marked with the value MUTEX_FLAG_WAITERS (i.e. mutex.owner = 1). That's the output I got from volshell:
(layer_name) >>> db(0xffffb71f45c0)
0xffffb71f45c0 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0xffffb71f45d0 80 3d 48 42 48 98 ff ff d0 bd 25 49 48 98 ff ff .=HBH.....%IH...
0xffffb71f45e0 22 66 6f 6c 69 6f 3d 25 70 20 70 66 6e 3d 30 78 "folio=%p.pfn=0x
0xffffb71f45f0 25 6c 78 22 2c 20 52 45 43 2d 3e 66 6f 6c 69 6f %lx",.REC->folio
0xffffb71f4600 2c 20 52 45 43 2d 3e 70 66 6e 00 00 00 00 00 00 ,.REC->pfn......
0xffffb71f4610 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0xffffb71f4620 22 66 6f 6c 69 6f 3d 25 70 20 70 66 6e 3d 30 78 "folio=%p.pfn=0x
0xffffb71f4630 25 6c 78 20 6c 72 75 3d 25 64 20 66 6c 61 67 73 %lx.lru=%d.flags
I can tell this is the correct memory location, since when inspecting this area statically, I see the same exact surrounding strings
The following tasks are in a non-interruptible sleep state (task.state == 2):
PID: 3818, TGID: 3787, Name: flb-pipeline, State: 2
PID: 124147, TGID: 124132, Name: entrypoint, State: 2
PID: 124152, TGID: 124132, Name: entrypoint, State: 2
PID: 124156, TGID: 124132, Name: entrypoint, State: 2
PID: 124317, TGID: 124132, Name: entrypoint, State: 2
PID: 124318, TGID: 124132, Name: entrypoint, State: 2
And when extracting the pscall stack for these TGIDs, I get the following:
TID Comm Position Address Value Name Type Module
3787 fluent-bit 6 0x984842083db0 0xffffb5beeda7 __schedule t kernel
3787 fluent-bit 8 0x984842083dc0 0xffffb51b9575 enqueue_hrtimer t kernel
3787 fluent-bit 19 0x984842083e18 0xffffb5bef0a3 schedule T kernel
3787 fluent-bit 21 0x984842083e28 0xffffb5bf4f6c do_nanosleep t kernel
3787 fluent-bit 29 0x984842083e68 0xffffb51baf73 hrtimer_nanosleep T kernel
3787 fluent-bit 35 0x984842083e98 0xffffb51b98c0 hrtimer_wakeup t kernel
3787 fluent-bit 44 0x984842083ee0 0xffffb51c48b0 common_nsleep t kernel
3787 fluent-bit 45 0x984842083ee8 0xffffb51c7079 __x64_sys_clock_nanosleep T kernel
3787 fluent-bit 53 0x984842083f28 0xffffb5be596b do_syscall_64 T kernel
3787 fluent-bit 58 0x984842083f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
3818 flb-pipeline 6 0x984842033d48 0xffffb5beeda7 __schedule t kernel
3818 flb-pipeline 19 0x984842033db0 0xffffb5bef0a3 schedule T kernel
3818 flb-pipeline 21 0x984842033dc0 0xffffb5bef331 schedule_preempt_disabled T kernel
3818 flb-pipeline 22 0x984842033dc8 0xffffb5bf05cd __mutex_lock.constprop.0 t kernel
3818 flb-pipeline 36 0x984842033e38 0xffffb5328d52 __lru_add_drain_all t kernel
3818 flb-pipeline 43 0x984842033e70 0xffffb53229c7 generic_fadvise T kernel
3818 flb-pipeline 58 0x984842033ee8 0xffffb5322a9c ksys_fadvise64_64 T kernel
3818 flb-pipeline 65 0x984842033f20 0xffffb5322b78 __x64_sys_fadvise64 T kernel
3818 flb-pipeline 66 0x984842033f28 0xffffb5be596b do_syscall_64 T kernel
3818 flb-pipeline 71 0x984842033f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
3819 flb-logger 6 0x9848420e7d08 0xffffb5beeda7 __schedule t kernel
3819 flb-logger 19 0x9848420e7d70 0xffffb5bef0a3 schedule T kernel
3819 flb-logger 21 0x9848420e7d80 0xffffb5bf4e98 schedule_hrtimeout_range_clock T kernel
3819 flb-logger 22 0x9848420e7d88 0xffffb54968b9 ep_send_events t kernel
3819 flb-logger 37 0x9848420e7e00 0xffffb5497598 ep_poll t kernel
3819 flb-logger 47 0x9848420e7e50 0xffffb54965e0 ep_autoremove_wake_function t kernel
3819 flb-logger 57 0x9848420e7ea0 0xffffb54976b8 do_epoll_wait t kernel
3819 flb-logger 64 0x9848420e7ed8 0xffffb5497f40 __x64_sys_epoll_wait T kernel
3819 flb-logger 67 0x9848420e7ef0 0xffffb542d3d8 ksys_read T kernel
3819 flb-logger 74 0x9848420e7f28 0xffffb5be596b do_syscall_64 T kernel
3819 flb-logger 79 0x9848420e7f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
3826 flb-out-azure_b 6 0x98484208bd08 0xffffb5beeda7 __schedule t kernel
3826 flb-out-azure_b 19 0x98484208bd70 0xffffb5bef0a3 schedule T kernel
3826 flb-out-azure_b 21 0x98484208bd80 0xffffb5bf4e98 schedule_hrtimeout_range_clock T kernel
3826 flb-out-azure_b 22 0x98484208bd88 0xffffb54968b9 ep_send_events t kernel
3826 flb-out-azure_b 37 0x98484208be00 0xffffb5497598 ep_poll t kernel
3826 flb-out-azure_b 47 0x98484208be50 0xffffb54965e0 ep_autoremove_wake_function t kernel
3826 flb-out-azure_b 57 0x98484208bea0 0xffffb54976b8 do_epoll_wait t kernel
3826 flb-out-azure_b 64 0x98484208bed8 0xffffb5497f40 __x64_sys_epoll_wait T kernel
3826 flb-out-azure_b 74 0x98484208bf28 0xffffb5be596b do_syscall_64 T kernel
3826 flb-out-azure_b 79 0x98484208bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
3827 monkey: server 6 0x98484207bd08 0xffffb5beeda7 __schedule t kernel
3827 monkey: server 19 0x98484207bd70 0xffffb5bef0a3 schedule T kernel
3827 monkey: server 21 0x98484207bd80 0xffffb5bf4e98 schedule_hrtimeout_range_clock T kernel
3827 monkey: server 27 0x98484207bdb0 0xffffb5beedaf __schedule t kernel
3827 monkey: server 29 0x98484207bdc0 0xffffb51b9575 enqueue_hrtimer t kernel
3827 monkey: server 37 0x98484207be00 0xffffb5497598 ep_poll t kernel
3827 monkey: server 47 0x98484207be50 0xffffb54965e0 ep_autoremove_wake_function t kernel
3827 monkey: server 57 0x98484207bea0 0xffffb54976b8 do_epoll_wait t kernel
3827 monkey: server 64 0x98484207bed8 0xffffb5497f40 __x64_sys_epoll_wait T kernel
3827 monkey: server 66 0x98484207bee8 0xffffb5051f9b switch_fpu_return T kernel
3827 monkey: server 74 0x98484207bf28 0xffffb5be596b do_syscall_64 T kernel
3827 monkey: server 79 0x98484207bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
3830 monkey: clock 6 0x9848420ffdb0 0xffffb5beeda7 __schedule t kernel
3830 monkey: clock 8 0x9848420ffdc0 0xffffb51b9575 enqueue_hrtimer t kernel
3830 monkey: clock 19 0x9848420ffe18 0xffffb5bef0a3 schedule T kernel
3830 monkey: clock 21 0x9848420ffe28 0xffffb5bf4f6c do_nanosleep t kernel
3830 monkey: clock 29 0x9848420ffe68 0xffffb51baf73 hrtimer_nanosleep T kernel
3830 monkey: clock 35 0x9848420ffe98 0xffffb51b98c0 hrtimer_wakeup t kernel
3830 monkey: clock 44 0x9848420ffee0 0xffffb51c48b0 common_nsleep t kernel
3830 monkey: clock 45 0x9848420ffee8 0xffffb51c7079 __x64_sys_clock_nanosleep T kernel
3830 monkey: clock 53 0x9848420fff28 0xffffb5be596b do_syscall_64 T kernel
3830 monkey: clock 58 0x9848420fff50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
3831 monkey: wrk/0 6 0x984842107d08 0xffffb5beeda7 __schedule t kernel
3831 monkey: wrk/0 19 0x984842107d70 0xffffb5bef0a3 schedule T kernel
3831 monkey: wrk/0 21 0x984842107d80 0xffffb5bf4e98 schedule_hrtimeout_range_clock T kernel
3831 monkey: wrk/0 37 0x984842107e00 0xffffb5497598 ep_poll t kernel
3831 monkey: wrk/0 47 0x984842107e50 0xffffb54965e0 ep_autoremove_wake_function t kernel
3831 monkey: wrk/0 57 0x984842107ea0 0xffffb54976b8 do_epoll_wait t kernel
3831 monkey: wrk/0 64 0x984842107ed8 0xffffb5497f40 __x64_sys_epoll_wait T kernel
3831 monkey: wrk/0 74 0x984842107f28 0xffffb5be596b do_syscall_64 T kernel
3831 monkey: wrk/0 79 0x984842107f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124132 entrypoint 6 0x9848491f3cb0 0xffffb5beeda7 __schedule t kernel
124132 entrypoint 19 0x9848491f3d18 0xffffb5bef0a3 schedule T kernel
124132 entrypoint 21 0x9848491f3d28 0xffffb51d9505 futex_wait_queue T kernel
124132 entrypoint 26 0x9848491f3d50 0xffffb51d9bc8 __futex_wait T kernel
124132 entrypoint 35 0x9848491f3d98 0xffffb51d8ec0 futex_wake_mark T kernel
124132 entrypoint 50 0x9848491f3e10 0xffffb51d9cb4 futex_wait T kernel
124132 entrypoint 55 0x9848491f3e38 0xffffb51d90f5 futex_wake T kernel
124132 entrypoint 68 0x9848491f3ea0 0xffffb51d5f08 do_futex T kernel
124132 entrypoint 70 0x9848491f3eb0 0xffffb51d63e3 __x64_sys_futex T kernel
124132 entrypoint 85 0x9848491f3f28 0xffffb5be596b do_syscall_64 T kernel
124132 entrypoint 90 0x9848491f3f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124145 entrypoint 6 0x98484905bcb0 0xffffb5beeda7 __schedule t kernel
124145 entrypoint 8 0x98484905bcc0 0xffffb51b9575 enqueue_hrtimer t kernel
124145 entrypoint 19 0x98484905bd18 0xffffb5bef0a3 schedule T kernel
124145 entrypoint 21 0x98484905bd28 0xffffb51d9505 futex_wait_queue T kernel
124145 entrypoint 26 0x98484905bd50 0xffffb51d9bc8 __futex_wait T kernel
124145 entrypoint 35 0x98484905bd98 0xffffb51d8ec0 futex_wake_mark T kernel
124145 entrypoint 50 0x98484905be10 0xffffb51d9cb4 futex_wait T kernel
124145 entrypoint 57 0x98484905be48 0xffffb51b98c0 hrtimer_wakeup t kernel
124145 entrypoint 68 0x98484905bea0 0xffffb51d5f08 do_futex T kernel
124145 entrypoint 70 0x98484905beb0 0xffffb51d63e3 __x64_sys_futex T kernel
124145 entrypoint 85 0x98484905bf28 0xffffb5be596b do_syscall_64 T kernel
124145 entrypoint 90 0x98484905bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124146 entrypoint 6 0x984849213cb0 0xffffb5beeda7 __schedule t kernel
124146 entrypoint 19 0x984849213d18 0xffffb5bef0a3 schedule T kernel
124146 entrypoint 21 0x984849213d28 0xffffb51d9505 futex_wait_queue T kernel
124146 entrypoint 26 0x984849213d50 0xffffb51d9bc8 __futex_wait T kernel
124146 entrypoint 35 0x984849213d98 0xffffb51d8ec0 futex_wake_mark T kernel
124146 entrypoint 50 0x984849213e10 0xffffb51d9cb4 futex_wait T kernel
124146 entrypoint 55 0x984849213e38 0xffffb51d90f5 futex_wake T kernel
124146 entrypoint 68 0x984849213ea0 0xffffb51d5f08 do_futex T kernel
124146 entrypoint 70 0x984849213eb0 0xffffb51d63e3 __x64_sys_futex T kernel
124146 entrypoint 85 0x984849213f28 0xffffb5be596b do_syscall_64 T kernel
124146 entrypoint 90 0x984849213f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124147 entrypoint 6 0x98484924bd48 0xffffb5beeda7 __schedule t kernel
124147 entrypoint 19 0x98484924bdb0 0xffffb5bef0a3 schedule T kernel
124147 entrypoint 21 0x98484924bdc0 0xffffb5bef331 schedule_preempt_disabled T kernel
124147 entrypoint 22 0x98484924bdc8 0xffffb5bf05cd __mutex_lock.constprop.0 t kernel
124147 entrypoint 36 0x98484924be38 0xffffb5328d52 __lru_add_drain_all t kernel
124147 entrypoint 43 0x98484924be70 0xffffb53229c7 generic_fadvise T kernel
124147 entrypoint 58 0x98484924bee8 0xffffb5322ad8 ksys_fadvise64_64 T kernel
124147 entrypoint 65 0x98484924bf20 0xffffb5322b78 __x64_sys_fadvise64 T kernel
124147 entrypoint 66 0x98484924bf28 0xffffb5be596b do_syscall_64 T kernel
124147 entrypoint 71 0x98484924bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124148 entrypoint 6 0x9848491ebcb0 0xffffb5beeda7 __schedule t kernel
124148 entrypoint 19 0x9848491ebd18 0xffffb5bef0a3 schedule T kernel
124148 entrypoint 21 0x9848491ebd28 0xffffb51d9505 futex_wait_queue T kernel
124148 entrypoint 26 0x9848491ebd50 0xffffb51d9bc8 __futex_wait T kernel
124148 entrypoint 35 0x9848491ebd98 0xffffb51d8ec0 futex_wake_mark T kernel
124148 entrypoint 50 0x9848491ebe10 0xffffb51d9cb4 futex_wait T kernel
124148 entrypoint 68 0x9848491ebea0 0xffffb51d5f08 do_futex T kernel
124148 entrypoint 70 0x9848491ebeb0 0xffffb51d63e3 __x64_sys_futex T kernel
124148 entrypoint 85 0x9848491ebf28 0xffffb5be596b do_syscall_64 T kernel
124148 entrypoint 90 0x9848491ebf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124149 entrypoint 6 0x984849253cb0 0xffffb5beeda7 __schedule t kernel
124149 entrypoint 19 0x984849253d18 0xffffb5bef0a3 schedule T kernel
124149 entrypoint 21 0x984849253d28 0xffffb51d9505 futex_wait_queue T kernel
124149 entrypoint 26 0x984849253d50 0xffffb51d9bc8 __futex_wait T kernel
124149 entrypoint 35 0x984849253d98 0xffffb51d8ec0 futex_wake_mark T kernel
124149 entrypoint 50 0x984849253e10 0xffffb51d9cb4 futex_wait T kernel
124149 entrypoint 53 0x984849253e28 0xffffb5261d97 trace_call_bpf T kernel
124149 entrypoint 68 0x984849253ea0 0xffffb51d5f08 do_futex T kernel
124149 entrypoint 70 0x984849253eb0 0xffffb51d63e3 __x64_sys_futex T kernel
124149 entrypoint 85 0x984849253f28 0xffffb5be596b do_syscall_64 T kernel
124149 entrypoint 90 0x984849253f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124150 entrypoint 6 0x984849143cb0 0xffffb5beeda7 __schedule t kernel
124150 entrypoint 19 0x984849143d18 0xffffb5bef0a3 schedule T kernel
124150 entrypoint 21 0x984849143d28 0xffffb51d9505 futex_wait_queue T kernel
124150 entrypoint 26 0x984849143d50 0xffffb51d9bc8 __futex_wait T kernel
124150 entrypoint 35 0x984849143d98 0xffffb51d8ec0 futex_wake_mark T kernel
124150 entrypoint 50 0x984849143e10 0xffffb51d9cb4 futex_wait T kernel
124150 entrypoint 56 0x984849143e40 0xffffb51b98c0 hrtimer_wakeup t kernel
124150 entrypoint 68 0x984849143ea0 0xffffb51d5f08 do_futex T kernel
124150 entrypoint 70 0x984849143eb0 0xffffb51d63e3 __x64_sys_futex T kernel
124150 entrypoint 85 0x984849143f28 0xffffb5be596b do_syscall_64 T kernel
124150 entrypoint 90 0x984849143f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124151 entrypoint 6 0x9848491cbcb0 0xffffb5beeda7 __schedule t kernel
124151 entrypoint 19 0x9848491cbd18 0xffffb5bef0a3 schedule T kernel
124151 entrypoint 21 0x9848491cbd28 0xffffb51d9505 futex_wait_queue T kernel
124151 entrypoint 26 0x9848491cbd50 0xffffb51d9bc8 __futex_wait T kernel
124151 entrypoint 35 0x9848491cbd98 0xffffb51d8ec0 futex_wake_mark T kernel
124151 entrypoint 50 0x9848491cbe10 0xffffb51d9cb4 futex_wait T kernel
124151 entrypoint 53 0x9848491cbe28 0xffffb5261d97 trace_call_bpf T kernel
124151 entrypoint 68 0x9848491cbea0 0xffffb51d5f08 do_futex T kernel
124151 entrypoint 70 0x9848491cbeb0 0xffffb51d63e3 __x64_sys_futex T kernel
124151 entrypoint 85 0x9848491cbf28 0xffffb5be596b do_syscall_64 T kernel
124151 entrypoint 90 0x9848491cbf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124152 entrypoint 6 0x9848491c3d48 0xffffb5beeda7 __schedule t kernel
124152 entrypoint 19 0x9848491c3db0 0xffffb5bef0a3 schedule T kernel
124152 entrypoint 21 0x9848491c3dc0 0xffffb5bef331 schedule_preempt_disabled T kernel
124152 entrypoint 22 0x9848491c3dc8 0xffffb5bf05cd __mutex_lock.constprop.0 t kernel
124152 entrypoint 36 0x9848491c3e38 0xffffb5328d52 __lru_add_drain_all t kernel
124152 entrypoint 43 0x9848491c3e70 0xffffb53229c7 generic_fadvise T kernel
124152 entrypoint 49 0x9848491c3ea0 0xffffb51d5e8e do_futex T kernel
124152 entrypoint 58 0x9848491c3ee8 0xffffb5322ad8 ksys_fadvise64_64 T kernel
124152 entrypoint 65 0x9848491c3f20 0xffffb5322b78 __x64_sys_fadvise64 T kernel
124152 entrypoint 66 0x9848491c3f28 0xffffb5be596b do_syscall_64 T kernel
124152 entrypoint 71 0x9848491c3f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124153 entrypoint 6 0x98484921bcb0 0xffffb5beeda7 __schedule t kernel
124153 entrypoint 19 0x98484921bd18 0xffffb5bef0a3 schedule T kernel
124153 entrypoint 21 0x98484921bd28 0xffffb51d9505 futex_wait_queue T kernel
124153 entrypoint 26 0x98484921bd50 0xffffb51d9bc8 __futex_wait T kernel
124153 entrypoint 35 0x98484921bd98 0xffffb51d8ec0 futex_wake_mark T kernel
124153 entrypoint 50 0x98484921be10 0xffffb51d9cb4 futex_wait T kernel
124153 entrypoint 55 0x98484921be38 0xffffb5bef0a3 schedule T kernel
124153 entrypoint 57 0x98484921be48 0xffffb5bf4f74 do_nanosleep t kernel
124153 entrypoint 58 0x98484921be50 0xffffb531471d rseq_get_rseq_cs t kernel
124153 entrypoint 68 0x98484921bea0 0xffffb51d5f08 do_futex T kernel
124153 entrypoint 70 0x98484921beb0 0xffffb51d63e3 __x64_sys_futex T kernel
124153 entrypoint 85 0x98484921bf28 0xffffb5be596b do_syscall_64 T kernel
124153 entrypoint 90 0x98484921bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124156 entrypoint 6 0x98484925bd48 0xffffb5beeda7 __schedule t kernel
124156 entrypoint 19 0x98484925bdb0 0xffffb5bef0a3 schedule T kernel
124156 entrypoint 21 0x98484925bdc0 0xffffb5bef331 schedule_preempt_disabled T kernel
124156 entrypoint 22 0x98484925bdc8 0xffffb5bf05cd __mutex_lock.constprop.0 t kernel
124156 entrypoint 27 0x98484925bdf0 0xffffb53299f0 lru_add t kernel
124156 entrypoint 36 0x98484925be38 0xffffb5328d52 __lru_add_drain_all t kernel
124156 entrypoint 43 0x98484925be70 0xffffb53229c7 generic_fadvise T kernel
124156 entrypoint 49 0x98484925bea0 0xffffb51d5e8e do_futex T kernel
124156 entrypoint 58 0x98484925bee8 0xffffb5322ad8 ksys_fadvise64_64 T kernel
124156 entrypoint 65 0x98484925bf20 0xffffb5322b78 __x64_sys_fadvise64 T kernel
124156 entrypoint 66 0x98484925bf28 0xffffb5be596b do_syscall_64 T kernel
124156 entrypoint 71 0x98484925bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124313 entrypoint 6 0x98484938bcb0 0xffffb5beeda7 __schedule t kernel
124313 entrypoint 19 0x98484938bd18 0xffffb5bef0a3 schedule T kernel
124313 entrypoint 21 0x98484938bd28 0xffffb51d9505 futex_wait_queue T kernel
124313 entrypoint 26 0x98484938bd50 0xffffb51d9bc8 __futex_wait T kernel
124313 entrypoint 35 0x98484938bd98 0xffffb51d8ec0 futex_wake_mark T kernel
124313 entrypoint 50 0x98484938be10 0xffffb51d9cb4 futex_wait T kernel
124313 entrypoint 55 0x98484938be38 0xffffb51d90f5 futex_wake T kernel
124313 entrypoint 58 0x98484938be50 0xffffb531471d rseq_get_rseq_cs t kernel
124313 entrypoint 68 0x98484938bea0 0xffffb51d5f08 do_futex T kernel
124313 entrypoint 70 0x98484938beb0 0xffffb51d63e3 __x64_sys_futex T kernel
124313 entrypoint 85 0x98484938bf28 0xffffb5be596b do_syscall_64 T kernel
124313 entrypoint 90 0x98484938bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124314 entrypoint 6 0x984849383cb0 0xffffb5beeda7 __schedule t kernel
124314 entrypoint 8 0x984849383cc0 0xffffb5454d9a touch_atime T kernel
124314 entrypoint 19 0x984849383d18 0xffffb5bef0a3 schedule T kernel
124314 entrypoint 21 0x984849383d28 0xffffb51d9505 futex_wait_queue T kernel
124314 entrypoint 26 0x984849383d50 0xffffb51d9bc8 __futex_wait T kernel
124314 entrypoint 35 0x984849383d98 0xffffb51d8ec0 futex_wake_mark T kernel
124314 entrypoint 50 0x984849383e10 0xffffb51d9cb4 futex_wait T kernel
124314 entrypoint 58 0x984849383e50 0xffffb531471d rseq_get_rseq_cs t kernel
124314 entrypoint 59 0x984849383e58 0xffffb542c76c vfs_read T kernel
124314 entrypoint 68 0x984849383ea0 0xffffb51d5f08 do_futex T kernel
124314 entrypoint 70 0x984849383eb0 0xffffb51d63e3 __x64_sys_futex T kernel
124314 entrypoint 85 0x984849383f28 0xffffb5be596b do_syscall_64 T kernel
124314 entrypoint 90 0x984849383f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124315 entrypoint 0 0x98484968bcb0 0xffffb5beeda7 __schedule t kernel
124315 entrypoint 13 0x98484968bd18 0xffffb5bef0a3 schedule T kernel
124315 entrypoint 15 0x98484968bd28 0xffffb51d9505 futex_wait_queue T kernel
124315 entrypoint 20 0x98484968bd50 0xffffb51d9bc8 __futex_wait T kernel
124315 entrypoint 29 0x98484968bd98 0xffffb51d8ec0 futex_wake_mark T kernel
124315 entrypoint 44 0x98484968be10 0xffffb51d9cb4 futex_wait T kernel
124315 entrypoint 48 0x98484968be30 0xffffb5405315 __memcg_slab_free_hook T kernel
124315 entrypoint 62 0x98484968bea0 0xffffb51d5f08 do_futex T kernel
124315 entrypoint 64 0x98484968beb0 0xffffb51d63e3 __x64_sys_futex T kernel
124315 entrypoint 79 0x98484968bf28 0xffffb5be596b do_syscall_64 T kernel
124315 entrypoint 84 0x98484968bf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124316 entrypoint 2 0x9848493fbc90 0xffffb515fc35 psi_task_switch T kernel
124316 entrypoint 12 0x9848493fbce0 0xffffb5beeda7 __schedule t kernel
124316 entrypoint 14 0x9848493fbcf0 0xffffb51b9575 enqueue_hrtimer t kernel
124316 entrypoint 25 0x9848493fbd48 0xffffb5bef0a3 schedule T kernel
124316 entrypoint 27 0x9848493fbd58 0xffffb5bf4dcc schedule_hrtimeout_range_clock T kernel
124316 entrypoint 33 0x9848493fbd88 0xffffb51b98c0 hrtimer_wakeup t kernel
124316 entrypoint 43 0x9848493fbdd8 0xffffb5497598 ep_poll t kernel
124316 entrypoint 53 0x9848493fbe28 0xffffb54965e0 ep_autoremove_wake_function t kernel
124316 entrypoint 63 0x9848493fbe78 0xffffb54976b8 do_epoll_wait t kernel
124316 entrypoint 70 0x9848493fbeb0 0xffffb549780c do_compat_epoll_pwait.part.0 t kernel
124316 entrypoint 72 0x9848493fbec0 0xffffb5497bff __x64_sys_epoll_pwait T kernel
124316 entrypoint 85 0x9848493fbf28 0xffffb5be596b do_syscall_64 T kernel
124316 entrypoint 90 0x9848493fbf50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124317 entrypoint 6 0x984849663d48 0xffffb5beeda7 __schedule t kernel
124317 entrypoint 19 0x984849663db0 0xffffb5bef0a3 schedule T kernel
124317 entrypoint 21 0x984849663dc0 0xffffb5bef331 schedule_preempt_disabled T kernel
124317 entrypoint 22 0x984849663dc8 0xffffb5bf05cd __mutex_lock.constprop.0 t kernel
124317 entrypoint 36 0x984849663e38 0xffffb5328d52 __lru_add_drain_all t kernel
124317 entrypoint 43 0x984849663e70 0xffffb53229c7 generic_fadvise T kernel
124317 entrypoint 49 0x984849663ea0 0xffffb51d5e8e do_futex T kernel
124317 entrypoint 58 0x984849663ee8 0xffffb5322ad8 ksys_fadvise64_64 T kernel
124317 entrypoint 65 0x984849663f20 0xffffb5322b78 __x64_sys_fadvise64 T kernel
124317 entrypoint 66 0x984849663f28 0xffffb5be596b do_syscall_64 T kernel
124317 entrypoint 71 0x984849663f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124318 entrypoint 6 0x9848496ffd48 0xffffb5beeda7 __schedule t kernel
124318 entrypoint 19 0x9848496ffdb0 0xffffb5bef0a3 schedule T kernel
124318 entrypoint 21 0x9848496ffdc0 0xffffb5bef331 schedule_preempt_disabled T kernel
124318 entrypoint 22 0x9848496ffdc8 0xffffb5bf05cd __mutex_lock.constprop.0 t kernel
124318 entrypoint 36 0x9848496ffe38 0xffffb5328d52 __lru_add_drain_all t kernel
124318 entrypoint 43 0x9848496ffe70 0xffffb53229c7 generic_fadvise T kernel
124318 entrypoint 49 0x9848496ffea0 0xffffb51d5e8e do_futex T kernel
124318 entrypoint 58 0x9848496ffee8 0xffffb5322ad8 ksys_fadvise64_64 T kernel
124318 entrypoint 65 0x9848496fff20 0xffffb5322b78 __x64_sys_fadvise64 T kernel
124318 entrypoint 66 0x9848496fff28 0xffffb5be596b do_syscall_64 T kernel
124318 entrypoint 71 0x9848496fff50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
124319 entrypoint 6 0x984849707cb0 0xffffb5beeda7 __schedule t kernel
124319 entrypoint 19 0x984849707d18 0xffffb5bef0a3 schedule T kernel
124319 entrypoint 21 0x984849707d28 0xffffb51d9505 futex_wait_queue T kernel
124319 entrypoint 26 0x984849707d50 0xffffb51d9bc8 __futex_wait T kernel
124319 entrypoint 35 0x984849707d98 0xffffb51d8ec0 futex_wake_mark T kernel
124319 entrypoint 50 0x984849707e10 0xffffb51d9cb4 futex_wait T kernel
124319 entrypoint 57 0x984849707e48 0xffffb5bf653a _raw_spin_unlock T kernel
124319 entrypoint 58 0x984849707e50 0xffffb531471d rseq_get_rseq_cs t kernel
124319 entrypoint 68 0x984849707ea0 0xffffb51d5f08 do_futex T kernel
124319 entrypoint 70 0x984849707eb0 0xffffb51d63e3 __x64_sys_futex T kernel
124319 entrypoint 85 0x984849707f28 0xffffb5be596b do_syscall_64 T kernel
124319 entrypoint 90 0x984849707f50 0xffffb5c0012b entry_SYSCALL_64_after_hwframe T kernel
It's a lot of raw data, which doesn't necessarily mean anything of significance. I thought I'd share it with you, perhaps you'll have an idea why are the tasks still hung even though it seems like the mutex has no owner? I do understand that this might be a highly contented mutex as well, but again, our code does not modify the underlying filesystem, which I don't believe should trigger a page cache flush to disk.
@abuehaze14 Have you had the chance of looking into my last comments? Are they at all helpful?
Hi @gmwiz,
I apologize for the delay on this issue, I tried to look for possible commits that would contribute to the reported contention on the lru_drain mutex_lock but so far I couldn't find specific culprit.
It will be good if you can attach the kernel crash dump to a support ticket. If there are steps that we can take to reproduce this issue on our side , please share these steps with us.
Hi @gmwiz,
Have you managed to open a support case for this issue? Is there any chance that you can share your reproduction steps so we can debug this further on our side?
fwiw I was experiencing a similar problem and https://lore.kernel.org/all/[email protected]/ (backported in 6.12.54) fixed it
Hi @gemorin, Thank you for the update. @gmwiz Please let us know if you are still facing the issue on our latest AL2023 6.12 kernel which should be kernel-6.12.58-82.121.amzn2023 as of today.
Hi @abuehaze14 Thank you very much for following up. I did not eventually open a support ticket as the issue happend very rarely, and did not reproduce in roughly the past two months. Seems like it might be fixed in upstream. So far so good.
Thanks again, and I believe we can close this issue for now.
Hi @gmwiz, Thank you for the update! We are glad that the issue has been fixed for you.