amazon-linux-2023 icon indicating copy to clipboard operation
amazon-linux-2023 copied to clipboard

[Bug] - multiple hung tasks calling fadvise on kernel 6.12

Open gmwiz opened this issue 5 months ago • 7 comments

Describe the bug Our application runs on EKS 1.33 with latest AL2023 x86_64 nodes. It runs a workload that attaches several EBS volumes on the fly, mounts them, runs a certain logic, and then umounts and detaches the volume. Recently, EKS team has upgraded their node images to use the kernel 6.12 branch of AL2023. We started noticing quite a lot more containers simply hung for a few hours. Running dmesg on the node itself would usually show something similar as in the attached dmesg log. In the attached log we can see that there are 3 volumes, all ext4, and 4 hung tasks, all waiting on the fadvise mutex - fluentbit (flb-pipeline), a rust-based security agent (tokio-runtime-w), and our application (entrypoint). This happens rather fast, as can be seen, the first dmesg entry starts at the 240 seconds marker (~4 minutes after the node is started).

Our application and the security agent both call fadvise with POSIX_FADV_DONTNEED for the same reason, they both open files and read them exactly once. The goal is not to fill the page cache with unnecessary data.

I'd like to mention that this behavior did NOT occur with the kernel 6.1 version of EKS nodes, and very quickly started to show afterwards, that's why that's my initial suspicion.

Thanks in advance.

Expected behavior No hung tasks due to fadvise calls.

dmesg logs

[   90.984422] nvme nvme2: pci function 0000:00:1f.0
[   90.985027] nvme 0000:00:1f.0: enabling device (0000 -> 0002)
[   90.987825] ACPI: \_SB_.LNKC: Enabled at IRQ 11
[   90.996176] nvme nvme2: 2/0/0 default/read/poll queues
[   91.131043]  nvme2n1: p1 p14 p15
[   94.727028] EXT4-fs (nvme1n1p1): 61 orphan inodes deleted
[   94.727722] EXT4-fs (nvme1n1p1): recovery complete
[   94.738069] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[   94.928915] EXT4-fs (nvme2n1p1): 56 orphan inodes deleted
[   94.928984] EXT4-fs (nvme1n1p1): unmounting filesystem a7e321c6-fe24-4b08-b922-b296032b6eda.
[   94.929609] EXT4-fs (nvme2n1p1): recovery complete
[   94.942794] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   94.944075] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[   95.098641] EXT4-fs (nvme2n1p1): unmounting filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace.
[   95.112313] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   95.140284] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[   95.152469] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   95.191207] EXT4-fs (nvme1n1p1): unmounting filesystem a7e321c6-fe24-4b08-b922-b296032b6eda.
[   95.201936] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   95.230268] EXT4-fs (nvme1n1p1): mounted filesystem a7e321c6-fe24-4b08-b922-b296032b6eda r/w with ordered data mode. Quota mode: none.
[   95.240912] FAT-fs (nvme1n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   95.306483] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[   95.317214] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   95.352621] EXT4-fs (nvme2n1p1): unmounting filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace.
[   95.363757] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[   95.390572] EXT4-fs (nvme2n1p1): mounted filesystem bb952a53-3b3a-48b7-8915-4c8254aa6ace r/w with ordered data mode. Quota mode: none.
[   95.454134] FAT-fs (nvme2n1p15): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  148.603898] sh (3589): drop_caches: 2
[  208.768362] sh (3589): drop_caches: 2
[  243.582122] INFO: task flb-pipeline:3818 blocked for more than 122 seconds.
[  243.583001]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  243.583905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.584854] task:flb-pipeline    state:D stack:0     pid:3818  tgid:3787  ppid:3157   flags:0x00000002
[  243.586823] Call Trace:
[  243.587170]  <TASK>
[  243.587477]  __schedule+0x267/0x530
[  243.587947]  schedule+0x23/0xa0
[  243.588358]  schedule_preempt_disabled+0x11/0x20
[  243.588940]  __mutex_lock.constprop.0+0x31d/0x650
[  243.589529]  __lru_add_drain_all+0x32/0x1f0
[  243.590061]  generic_fadvise+0x267/0x2b0
[  243.590562]  ksys_fadvise64_64+0x4c/0xa0
[  243.591057]  __x64_sys_fadvise64+0x18/0x20
[  243.591571]  do_syscall_64+0x5b/0x170
[  243.592079]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  243.592709] RIP: 0033:0x7fe995cf869e
[  243.593177] RSP: 002b:00007fe9951fa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  243.594104] RAX: ffffffffffffffda RBX: 00007fe9942f2540 RCX: 00007fe995cf869e
[  243.594972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000024
[  243.595849] RBP: 0000000000007fff R08: 0000000000000000 R09: 0000000000000000
[  243.596717] R10: 0000000000000004 R11: 0000000000000246 R12: 00007fe994242900
[  243.597585] R13: 00007fe995698000 R14: 0000000000000001 R15: 00007fe99562a000
[  243.598458]  </TASK>
[  243.598770] INFO: task tokio-runtime-w:4534 blocked for more than 122 seconds.
[  243.599647]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  243.600541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.601475] task:tokio-runtime-w state:D stack:0     pid:4534  tgid:4495  ppid:3677   flags:0x00000002
[  243.602613] Call Trace:
[  243.602940]  <TASK>
[  243.603230]  __schedule+0x267/0x530
[  243.603684]  schedule+0x23/0xa0
[  243.604097]  schedule_preempt_disabled+0x11/0x20
[  243.604672]  __mutex_lock.constprop.0+0x31d/0x650
[  243.605263]  __lru_add_drain_all+0x32/0x1f0
[  243.605791]  generic_fadvise+0x267/0x2b0
[  243.606323]  ksys_fadvise64_64+0x4c/0xa0
[  243.606817]  __x64_sys_fadvise64+0x18/0x20
[  243.607331]  do_syscall_64+0x5b/0x170
[  243.607798]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  243.608425] RIP: 0033:0x7fc0e1f4c9da
[  243.608889] RSP: 002b:00007fc0dbdfd998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  243.609803] RAX: ffffffffffffffda RBX: 00007fc0dbdfdb10 RCX: 00007fc0e1f4c9da
[  243.610676] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
[  243.611544] RBP: 00007fc0dbdfd9a0 R08: 00000000000000c8 R09: 0000000000000004
[  243.612494] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[  243.613640] R13: 0000000000000028 R14: 0000000000000001 R15: 0000000000000040
[  243.617328]  </TASK>
[  243.617732] INFO: task tokio-runtime-w:4558 blocked for more than 122 seconds.
[  243.618838]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  243.619889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.622887] task:tokio-runtime-w state:D stack:0     pid:4558  tgid:4495  ppid:3677   flags:0x00000002
[  243.624316] Call Trace:
[  243.624735]  <TASK>
[  243.625108]  __schedule+0x267/0x530
[  243.625661]  schedule+0x23/0xa0
[  243.627058]  schedule_preempt_disabled+0x11/0x20
[  243.627791]  __mutex_lock.constprop.0+0x31d/0x650
[  243.628530]  __lru_add_drain_all+0x32/0x1f0
[  243.630824]  generic_fadvise+0x267/0x2b0
[  243.631462]  ? ovl_path_realdata+0x76/0x90 [overlay]
[  243.632447]  ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[  243.633277]  ovl_fadvise+0x97/0xc0 [overlay]
[  243.633988]  ksys_fadvise64_64+0x4c/0xa0
[  243.634658]  __x64_sys_fadvise64+0x18/0x20
[  243.635312]  do_syscall_64+0x5b/0x170
[  243.635906]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  243.636690] RIP: 0033:0x7fc0e1f4c9da
[  243.637276] RSP: 002b:00007fc0d24ee998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  243.638429] RAX: ffffffffffffffda RBX: 00007fc0d24eeb10 RCX: 00007fc0e1f4c9da
[  243.639491] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000214
[  243.640557] RBP: 00007fc0d24ee9a0 R08: 00000000000000c8 R09: 0000000000000004
[  243.641644] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[  243.642787] R13: 0000000000000028 R14: 0000000000000004 R15: 0000000000000040
[  243.643846]  </TASK>
[  243.644223] INFO: task tokio-runtime-w:4560 blocked for more than 122 seconds.
[  243.645324]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  243.646432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.647594] task:tokio-runtime-w state:D stack:0     pid:4560  tgid:4495  ppid:3677   flags:0x00000002
[  243.648987] Call Trace:
[  243.649395]  <TASK>
[  243.649759]  __schedule+0x267/0x530
[  243.650341]  schedule+0x23/0xa0
[  243.650848]  schedule_preempt_disabled+0x11/0x20
[  243.651567]  __mutex_lock.constprop.0+0x31d/0x650
[  243.652356]  __lru_add_drain_all+0x32/0x1f0
[  243.653029]  generic_fadvise+0x267/0x2b0
[  243.653623]  ? ovl_path_realdata+0x76/0x90 [overlay]
[  243.654411]  ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[  243.655222]  ovl_fadvise+0x97/0xc0 [overlay]
[  243.655904]  ksys_fadvise64_64+0x4c/0xa0
[  243.656529]  __x64_sys_fadvise64+0x18/0x20
[  243.657165]  do_syscall_64+0x5b/0x170
[  243.657742]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  243.658514] RIP: 0033:0x7fc0e1f4c9da
[  243.659097] RSP: 002b:00007fc0d20ec998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  243.660211] RAX: ffffffffffffffda RBX: 00007fc0d20ecb10 RCX: 00007fc0e1f4c9da
[  243.661284] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000207
[  243.662407] RBP: 00007fc0d20ec9a0 R08: 00000000000000c8 R09: 0000000000000006
[  243.663463] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[  243.664533] R13: 0000000000000028 R14: 0000000000000003 R15: 0000000000000040
[  243.665623]  </TASK>
[  268.840461] sh (3589): drop_caches: 2
[  328.982835] sh (3589): drop_caches: 2
[  366.462099] INFO: task flb-pipeline:3818 blocked for more than 245 seconds.
[  366.462968]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  366.463856] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.464786] task:flb-pipeline    state:D stack:0     pid:3818  tgid:3787  ppid:3157   flags:0x00000002
[  366.465886] Call Trace:
[  366.466217]  <TASK>
[  366.466505]  __schedule+0x267/0x530
[  366.466949]  schedule+0x23/0xa0
[  366.467490]  schedule_preempt_disabled+0x11/0x20
[  366.468054]  __mutex_lock.constprop.0+0x31d/0x650
[  366.468633]  __lru_add_drain_all+0x32/0x1f0
[  366.469145]  generic_fadvise+0x267/0x2b0
[  366.469631]  ksys_fadvise64_64+0x4c/0xa0
[  366.470131]  __x64_sys_fadvise64+0x18/0x20
[  366.470633]  do_syscall_64+0x5b/0x170
[  366.471111]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  366.471733] RIP: 0033:0x7fe995cf869e
[  366.472246] RSP: 002b:00007fe9951fa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  366.473139] RAX: ffffffffffffffda RBX: 00007fe9942f2540 RCX: 00007fe995cf869e
[  366.473989] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000024
[  366.474846] RBP: 0000000000007fff R08: 0000000000000000 R09: 0000000000000000
[  366.475701] R10: 0000000000000004 R11: 0000000000000246 R12: 00007fe994242900
[  366.476552] R13: 00007fe995698000 R14: 0000000000000001 R15: 00007fe99562a000
[  366.477405]  </TASK>
[  366.477711] INFO: task tokio-runtime-w:4534 blocked for more than 245 seconds.
[  366.478576]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  366.479451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.480368] task:tokio-runtime-w state:D stack:0     pid:4534  tgid:4495  ppid:3677   flags:0x00000002
[  366.481462] Call Trace:
[  366.481782]  <TASK>
[  366.482113]  __schedule+0x267/0x530
[  366.482553]  schedule+0x23/0xa0
[  366.482955]  schedule_preempt_disabled+0x11/0x20
[  366.483518]  __mutex_lock.constprop.0+0x31d/0x650
[  366.484094]  __lru_add_drain_all+0x32/0x1f0
[  366.484605]  generic_fadvise+0x267/0x2b0
[  366.485096]  ksys_fadvise64_64+0x4c/0xa0
[  366.485584]  __x64_sys_fadvise64+0x18/0x20
[  366.486095]  do_syscall_64+0x5b/0x170
[  366.486554]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  366.487171] RIP: 0033:0x7fc0e1f4c9da
[  366.487624] RSP: 002b:00007fc0dbdfd998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  366.488530] RAX: ffffffffffffffda RBX: 00007fc0dbdfdb10 RCX: 00007fc0e1f4c9da
[  366.489382] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202
[  366.490236] RBP: 00007fc0dbdfd9a0 R08: 00000000000000c8 R09: 0000000000000004
[  366.491088] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[  366.491968] R13: 0000000000000028 R14: 0000000000000001 R15: 0000000000000040
[  366.492821]  </TASK>
[  366.493117] INFO: task tokio-runtime-w:4558 blocked for more than 245 seconds.
[  366.493979]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  366.494862] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.495789] task:tokio-runtime-w state:D stack:0     pid:4558  tgid:4495  ppid:3677   flags:0x00000002
[  366.496878] Call Trace:
[  366.497198]  <TASK>
[  366.497480]  __schedule+0x267/0x530
[  366.497921]  schedule+0x23/0xa0
[  366.498328]  schedule_preempt_disabled+0x11/0x20
[  366.498901]  __mutex_lock.constprop.0+0x31d/0x650
[  366.499481]  __lru_add_drain_all+0x32/0x1f0
[  366.499995]  generic_fadvise+0x267/0x2b0
[  366.500482]  ? ovl_path_realdata+0x76/0x90 [overlay]
[  366.501097]  ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[  366.501737]  ovl_fadvise+0x97/0xc0 [overlay]
[  366.502313]  ksys_fadvise64_64+0x4c/0xa0
[  366.502799]  __x64_sys_fadvise64+0x18/0x20
[  366.503304]  do_syscall_64+0x5b/0x170
[  366.503775]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  366.504390] RIP: 0033:0x7fc0e1f4c9da
[  366.504842] RSP: 002b:00007fc0d24ee998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  366.505734] RAX: ffffffffffffffda RBX: 00007fc0d24eeb10 RCX: 00007fc0e1f4c9da
[  366.506602] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000214
[  366.507455] RBP: 00007fc0d24ee9a0 R08: 00000000000000c8 R09: 0000000000000004
[  366.508308] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[  366.509163] R13: 0000000000000028 R14: 0000000000000004 R15: 0000000000000040
[  366.510020]  </TASK>
[  366.510314] INFO: task tokio-runtime-w:4560 blocked for more than 245 seconds.
[  366.511175]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  366.512087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.513015] task:tokio-runtime-w state:D stack:0     pid:4560  tgid:4495  ppid:3677   flags:0x00000002
[  366.514107] Call Trace:
[  366.514427]  <TASK>
[  366.514708]  __schedule+0x267/0x530
[  366.515151]  schedule+0x23/0xa0
[  366.515555]  schedule_preempt_disabled+0x11/0x20
[  366.516124]  __mutex_lock.constprop.0+0x31d/0x650
[  366.516705]  __lru_add_drain_all+0x32/0x1f0
[  366.517221]  generic_fadvise+0x267/0x2b0
[  366.517710]  ? ovl_path_realdata+0x76/0x90 [overlay]
[  366.518324]  ? ovl_real_fdget_meta+0xc0/0x1b0 [overlay]
[  366.518958]  ovl_fadvise+0x97/0xc0 [overlay]
[  366.519497]  ksys_fadvise64_64+0x4c/0xa0
[  366.519984]  __x64_sys_fadvise64+0x18/0x20
[  366.520491]  do_syscall_64+0x5b/0x170
[  366.520954]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  366.521569] RIP: 0033:0x7fc0e1f4c9da
[  366.522098] RSP: 002b:00007fc0d20ec998 EFLAGS: 00000246 ORIG_RAX: 00000000000000dd
[  366.522997] RAX: ffffffffffffffda RBX: 00007fc0d20ecb10 RCX: 00007fc0e1f4c9da
[  366.523844] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000207
[  366.524693] RBP: 00007fc0d20ec9a0 R08: 00000000000000c8 R09: 0000000000000006
[  366.525547] R10: 0000000000000004 R11: 0000000000000246 R12: 8000000000000000
[  366.526403] R13: 0000000000000028 R14: 0000000000000003 R15: 0000000000000040
[  366.527257]  </TASK>
[  366.527555] INFO: task entrypoint:5601 blocked for more than 122 seconds.
[  366.528361]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  366.529240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.530168] task:entrypoint      state:D stack:0     pid:5601  tgid:5565  ppid:4954   flags:0x00000002
[  366.531258] Call Trace:
[  366.531578]  <TASK>
[  366.531862]  __schedule+0x267/0x530
[  366.532335]  schedule+0x23/0xa0
[  366.532737]  schedule_preempt_disabled+0x11/0x20
[  366.533306]  __mutex_lock.constprop.0+0x31d/0x650
[  366.533886]  ? __pfx_lru_add+0x10/0x10
[  366.534358]  __lru_add_drain_all+0x32/0x1f0
[  366.534872]  generic_fadvise+0x267/0x2b0
[  366.535359]  ksys_fadvise64_64+0x88/0xa0
[  366.535852]  __x64_sys_fadvise64+0x18/0x20
[  366.536358]  do_syscall_64+0x5b/0x170
[  366.536817]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  366.537429] RIP: 0033:0x4ca53ae
[  366.537832] RSP: 002b:000000c0138255a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[  366.538731] RAX: ffffffffffffffda RBX: 0000000000000059 RCX: 0000000004ca53ae
[  366.539583] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000059
[  366.540435] RBP: 000000c0138255e8 R08: 0000000000000000 R09: 0000000000000000
[  366.541286] R10: 0000000000000004 R11: 0000000000000202 R12: 000000c0138257f0
[  366.542168] R13: 000000c00f9b0840 R14: 000000c001ab9c00 R15: 0000000000000000
[  366.543019]  </TASK>
[  366.543315] INFO: task entrypoint:5621 blocked for more than 122 seconds.
[  366.544126]       Tainted: P           OE      6.12.37-61.105.amzn2023.x86_64 #1
[  366.545007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  366.545934] task:entrypoint      state:D stack:0     pid:5621  tgid:5565  ppid:4954   flags:0x00000002
[  366.547049] Call Trace:
[  366.547371]  <TASK>
[  366.547654]  __schedule+0x267/0x530
[  366.548096]  schedule+0x23/0xa0
[  366.548501]  schedule_preempt_disabled+0x11/0x20
[  366.549071]  __mutex_lock.constprop.0+0x31d/0x650
[  366.549649]  __lru_add_drain_all+0x32/0x1f0
[  366.550171]  generic_fadvise+0x267/0x2b0
[  366.550659]  ? do_futex+0xbe/0x1d0
[  366.551092]  ksys_fadvise64_64+0x88/0xa0
[  366.551575]  __x64_sys_fadvise64+0x18/0x20
[  366.552100]  do_syscall_64+0x5b/0x170
[  366.552560]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  366.553175] RIP: 0033:0x4ca53ae
[  366.553577] RSP: 002b:000000c013c5d5a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000dd
[  366.554489] RAX: ffffffffffffffda RBX: 0000000000000234 RCX: 0000000004ca53ae
[  366.555341] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000234
[  366.556195] RBP: 000000c013c5d5e8 R08: 0000000000000000 R09: 0000000000000000
[  366.557049] R10: 0000000000000004 R11: 0000000000000202 R12: 000000c013c5d7f0
[  366.557901] R13: ffffffffffffffff R14: 000000c001ab8e00 R15: 0000000000000000
[  366.558760]  </TASK>
[  366.559053] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

gmwiz avatar Aug 14 '25 11:08 gmwiz

Thanks for the report, I'm forwarding this to the kernel team.

awsthk avatar Aug 14 '25 12:08 awsthk

Hi @gmwiz,

Thank you for reporting this issue, The answers of below questions will help us narrowing down the root cause of the behavior you are reporting.

Can you elaborate what's the al2023 6.1 kernel version that you were using before hitting this issue? You also mentioned that the issue started after an EKS upgrade which usually updates the kernel and other software packages so we are wondering if you are seeing the same issue when updating the kernel only without updating other packages? Is there any reproduction steps that we can follow to trigger the same behavior? The shared logs and can see errors pointing to unclean unmounting for volumes so are you seeing the same errors on al2023 6.1? Can you elaborate if your application and the security agent are working on the same set of files? Are these files stored on the volumes that you previously unmounted?

The shared stack traces are pointing to a contention on mutex lock in the __lru_add_drain_all function that's responsible for going through all online CPUs and drain per cpu pages for each CPU. It seems that there's a task that acquire that mutex lock hence other threads that call fadvise with POSIX_FADV_DONTNEED will get stuck as that apparently require the same lock to be acquired. It's not clear from the stack traces which thread is actually acquiring that lock.

abuehaze14 avatar Aug 20 '25 14:08 abuehaze14

Hi @gmwiz,

Kind reminder that we are pending your reply to move forward with this issue.

abuehaze14 avatar Sep 01 '25 10:09 abuehaze14

Hi @abuehaze14 I'm sorry I missed your reply entirely!! Let me clarify - I did not upgrade EKS k8s version, just the worker node AMI images. We constantly upgrade to the latest version of EKS AMI images. The last version that did not have this issue is https://github.com/awslabs/amazon-eks-ami/releases/tag/v20250807 which contains kernel version 6.1.144-170.251.amzn2023. Yes, the uncleanly mounted volumes is something we always had in this cluster, given these are volumes created from an automated snapshot of a live machine. The security agent inspects all files in the node, that includes files from the unclean ext4 volumes. The application itself only goes over the files from the external volumes. So yes, they both go over files from the attached external volumes. Indeed that's exactly what I figured as well myself. I did get a chance to pull a full memory dump from the affected node before killing it, using LiME. Is there a way I could share it with you for further inspection? Perhaps via official AWS support?

gmwiz avatar Sep 01 '25 11:09 gmwiz

Here's some more information I gathered analyzing the memory dump I acquired. In two different consecutive dumps I got, it seems like the mutex has no owner but is solely marked with the value MUTEX_FLAG_WAITERS (i.e. mutex.owner = 1). That's the output I got from volshell:

(layer_name) >>> db(0xffffb71f45c0)
0xffffb71f45c0    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
0xffffb71f45d0    80 3d 48 42 48 98 ff ff d0 bd 25 49 48 98 ff ff    .=HBH.....%IH...
0xffffb71f45e0    22 66 6f 6c 69 6f 3d 25 70 20 70 66 6e 3d 30 78    "folio=%p.pfn=0x
0xffffb71f45f0    25 6c 78 22 2c 20 52 45 43 2d 3e 66 6f 6c 69 6f    %lx",.REC->folio
0xffffb71f4600    2c 20 52 45 43 2d 3e 70 66 6e 00 00 00 00 00 00    ,.REC->pfn......
0xffffb71f4610    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
0xffffb71f4620    22 66 6f 6c 69 6f 3d 25 70 20 70 66 6e 3d 30 78    "folio=%p.pfn=0x
0xffffb71f4630    25 6c 78 20 6c 72 75 3d 25 64 20 66 6c 61 67 73    %lx.lru=%d.flags

I can tell this is the correct memory location, since when inspecting this area statically, I see the same exact surrounding strings Image

The following tasks are in a non-interruptible sleep state (task.state == 2):

PID: 3818, TGID: 3787, Name: flb-pipeline, State: 2
PID: 124147, TGID: 124132, Name: entrypoint, State: 2
PID: 124152, TGID: 124132, Name: entrypoint, State: 2
PID: 124156, TGID: 124132, Name: entrypoint, State: 2
PID: 124317, TGID: 124132, Name: entrypoint, State: 2
PID: 124318, TGID: 124132, Name: entrypoint, State: 2

And when extracting the pscall stack for these TGIDs, I get the following:

TID	Comm	Position	Address	Value	Name	Type	Module

3787	fluent-bit	6	0x984842083db0	0xffffb5beeda7	__schedule	t	kernel
3787	fluent-bit	8	0x984842083dc0	0xffffb51b9575	enqueue_hrtimer	t	kernel
3787	fluent-bit	19	0x984842083e18	0xffffb5bef0a3	schedule	T	kernel
3787	fluent-bit	21	0x984842083e28	0xffffb5bf4f6c	do_nanosleep	t	kernel
3787	fluent-bit	29	0x984842083e68	0xffffb51baf73	hrtimer_nanosleep	T	kernel
3787	fluent-bit	35	0x984842083e98	0xffffb51b98c0	hrtimer_wakeup	t	kernel
3787	fluent-bit	44	0x984842083ee0	0xffffb51c48b0	common_nsleep	t	kernel
3787	fluent-bit	45	0x984842083ee8	0xffffb51c7079	__x64_sys_clock_nanosleep	T	kernel
3787	fluent-bit	53	0x984842083f28	0xffffb5be596b	do_syscall_64	T	kernel
3787	fluent-bit	58	0x984842083f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
3818	flb-pipeline	6	0x984842033d48	0xffffb5beeda7	__schedule	t	kernel
3818	flb-pipeline	19	0x984842033db0	0xffffb5bef0a3	schedule	T	kernel
3818	flb-pipeline	21	0x984842033dc0	0xffffb5bef331	schedule_preempt_disabled	T	kernel
3818	flb-pipeline	22	0x984842033dc8	0xffffb5bf05cd	__mutex_lock.constprop.0	t	kernel
3818	flb-pipeline	36	0x984842033e38	0xffffb5328d52	__lru_add_drain_all	t	kernel
3818	flb-pipeline	43	0x984842033e70	0xffffb53229c7	generic_fadvise	T	kernel
3818	flb-pipeline	58	0x984842033ee8	0xffffb5322a9c	ksys_fadvise64_64	T	kernel
3818	flb-pipeline	65	0x984842033f20	0xffffb5322b78	__x64_sys_fadvise64	T	kernel
3818	flb-pipeline	66	0x984842033f28	0xffffb5be596b	do_syscall_64	T	kernel
3818	flb-pipeline	71	0x984842033f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
3819	flb-logger	6	0x9848420e7d08	0xffffb5beeda7	__schedule	t	kernel
3819	flb-logger	19	0x9848420e7d70	0xffffb5bef0a3	schedule	T	kernel
3819	flb-logger	21	0x9848420e7d80	0xffffb5bf4e98	schedule_hrtimeout_range_clock	T	kernel
3819	flb-logger	22	0x9848420e7d88	0xffffb54968b9	ep_send_events	t	kernel
3819	flb-logger	37	0x9848420e7e00	0xffffb5497598	ep_poll	t	kernel
3819	flb-logger	47	0x9848420e7e50	0xffffb54965e0	ep_autoremove_wake_function	t	kernel
3819	flb-logger	57	0x9848420e7ea0	0xffffb54976b8	do_epoll_wait	t	kernel
3819	flb-logger	64	0x9848420e7ed8	0xffffb5497f40	__x64_sys_epoll_wait	T	kernel
3819	flb-logger	67	0x9848420e7ef0	0xffffb542d3d8	ksys_read	T	kernel
3819	flb-logger	74	0x9848420e7f28	0xffffb5be596b	do_syscall_64	T	kernel
3819	flb-logger	79	0x9848420e7f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
3826	flb-out-azure_b	6	0x98484208bd08	0xffffb5beeda7	__schedule	t	kernel
3826	flb-out-azure_b	19	0x98484208bd70	0xffffb5bef0a3	schedule	T	kernel
3826	flb-out-azure_b	21	0x98484208bd80	0xffffb5bf4e98	schedule_hrtimeout_range_clock	T	kernel
3826	flb-out-azure_b	22	0x98484208bd88	0xffffb54968b9	ep_send_events	t	kernel
3826	flb-out-azure_b	37	0x98484208be00	0xffffb5497598	ep_poll	t	kernel
3826	flb-out-azure_b	47	0x98484208be50	0xffffb54965e0	ep_autoremove_wake_function	t	kernel
3826	flb-out-azure_b	57	0x98484208bea0	0xffffb54976b8	do_epoll_wait	t	kernel
3826	flb-out-azure_b	64	0x98484208bed8	0xffffb5497f40	__x64_sys_epoll_wait	T	kernel
3826	flb-out-azure_b	74	0x98484208bf28	0xffffb5be596b	do_syscall_64	T	kernel
3826	flb-out-azure_b	79	0x98484208bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
3827	monkey: server	6	0x98484207bd08	0xffffb5beeda7	__schedule	t	kernel
3827	monkey: server	19	0x98484207bd70	0xffffb5bef0a3	schedule	T	kernel
3827	monkey: server	21	0x98484207bd80	0xffffb5bf4e98	schedule_hrtimeout_range_clock	T	kernel
3827	monkey: server	27	0x98484207bdb0	0xffffb5beedaf	__schedule	t	kernel
3827	monkey: server	29	0x98484207bdc0	0xffffb51b9575	enqueue_hrtimer	t	kernel
3827	monkey: server	37	0x98484207be00	0xffffb5497598	ep_poll	t	kernel
3827	monkey: server	47	0x98484207be50	0xffffb54965e0	ep_autoremove_wake_function	t	kernel
3827	monkey: server	57	0x98484207bea0	0xffffb54976b8	do_epoll_wait	t	kernel
3827	monkey: server	64	0x98484207bed8	0xffffb5497f40	__x64_sys_epoll_wait	T	kernel
3827	monkey: server	66	0x98484207bee8	0xffffb5051f9b	switch_fpu_return	T	kernel
3827	monkey: server	74	0x98484207bf28	0xffffb5be596b	do_syscall_64	T	kernel
3827	monkey: server	79	0x98484207bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
3830	monkey: clock	6	0x9848420ffdb0	0xffffb5beeda7	__schedule	t	kernel
3830	monkey: clock	8	0x9848420ffdc0	0xffffb51b9575	enqueue_hrtimer	t	kernel
3830	monkey: clock	19	0x9848420ffe18	0xffffb5bef0a3	schedule	T	kernel
3830	monkey: clock	21	0x9848420ffe28	0xffffb5bf4f6c	do_nanosleep	t	kernel
3830	monkey: clock	29	0x9848420ffe68	0xffffb51baf73	hrtimer_nanosleep	T	kernel
3830	monkey: clock	35	0x9848420ffe98	0xffffb51b98c0	hrtimer_wakeup	t	kernel
3830	monkey: clock	44	0x9848420ffee0	0xffffb51c48b0	common_nsleep	t	kernel
3830	monkey: clock	45	0x9848420ffee8	0xffffb51c7079	__x64_sys_clock_nanosleep	T	kernel
3830	monkey: clock	53	0x9848420fff28	0xffffb5be596b	do_syscall_64	T	kernel
3830	monkey: clock	58	0x9848420fff50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
3831	monkey: wrk/0	6	0x984842107d08	0xffffb5beeda7	__schedule	t	kernel
3831	monkey: wrk/0	19	0x984842107d70	0xffffb5bef0a3	schedule	T	kernel
3831	monkey: wrk/0	21	0x984842107d80	0xffffb5bf4e98	schedule_hrtimeout_range_clock	T	kernel
3831	monkey: wrk/0	37	0x984842107e00	0xffffb5497598	ep_poll	t	kernel
3831	monkey: wrk/0	47	0x984842107e50	0xffffb54965e0	ep_autoremove_wake_function	t	kernel
3831	monkey: wrk/0	57	0x984842107ea0	0xffffb54976b8	do_epoll_wait	t	kernel
3831	monkey: wrk/0	64	0x984842107ed8	0xffffb5497f40	__x64_sys_epoll_wait	T	kernel
3831	monkey: wrk/0	74	0x984842107f28	0xffffb5be596b	do_syscall_64	T	kernel
3831	monkey: wrk/0	79	0x984842107f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124132	entrypoint	6	0x9848491f3cb0	0xffffb5beeda7	__schedule	t	kernel
124132	entrypoint	19	0x9848491f3d18	0xffffb5bef0a3	schedule	T	kernel
124132	entrypoint	21	0x9848491f3d28	0xffffb51d9505	futex_wait_queue	T	kernel
124132	entrypoint	26	0x9848491f3d50	0xffffb51d9bc8	__futex_wait	T	kernel
124132	entrypoint	35	0x9848491f3d98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124132	entrypoint	50	0x9848491f3e10	0xffffb51d9cb4	futex_wait	T	kernel
124132	entrypoint	55	0x9848491f3e38	0xffffb51d90f5	futex_wake	T	kernel
124132	entrypoint	68	0x9848491f3ea0	0xffffb51d5f08	do_futex	T	kernel
124132	entrypoint	70	0x9848491f3eb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124132	entrypoint	85	0x9848491f3f28	0xffffb5be596b	do_syscall_64	T	kernel
124132	entrypoint	90	0x9848491f3f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124145	entrypoint	6	0x98484905bcb0	0xffffb5beeda7	__schedule	t	kernel
124145	entrypoint	8	0x98484905bcc0	0xffffb51b9575	enqueue_hrtimer	t	kernel
124145	entrypoint	19	0x98484905bd18	0xffffb5bef0a3	schedule	T	kernel
124145	entrypoint	21	0x98484905bd28	0xffffb51d9505	futex_wait_queue	T	kernel
124145	entrypoint	26	0x98484905bd50	0xffffb51d9bc8	__futex_wait	T	kernel
124145	entrypoint	35	0x98484905bd98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124145	entrypoint	50	0x98484905be10	0xffffb51d9cb4	futex_wait	T	kernel
124145	entrypoint	57	0x98484905be48	0xffffb51b98c0	hrtimer_wakeup	t	kernel
124145	entrypoint	68	0x98484905bea0	0xffffb51d5f08	do_futex	T	kernel
124145	entrypoint	70	0x98484905beb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124145	entrypoint	85	0x98484905bf28	0xffffb5be596b	do_syscall_64	T	kernel
124145	entrypoint	90	0x98484905bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124146	entrypoint	6	0x984849213cb0	0xffffb5beeda7	__schedule	t	kernel
124146	entrypoint	19	0x984849213d18	0xffffb5bef0a3	schedule	T	kernel
124146	entrypoint	21	0x984849213d28	0xffffb51d9505	futex_wait_queue	T	kernel
124146	entrypoint	26	0x984849213d50	0xffffb51d9bc8	__futex_wait	T	kernel
124146	entrypoint	35	0x984849213d98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124146	entrypoint	50	0x984849213e10	0xffffb51d9cb4	futex_wait	T	kernel
124146	entrypoint	55	0x984849213e38	0xffffb51d90f5	futex_wake	T	kernel
124146	entrypoint	68	0x984849213ea0	0xffffb51d5f08	do_futex	T	kernel
124146	entrypoint	70	0x984849213eb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124146	entrypoint	85	0x984849213f28	0xffffb5be596b	do_syscall_64	T	kernel
124146	entrypoint	90	0x984849213f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124147	entrypoint	6	0x98484924bd48	0xffffb5beeda7	__schedule	t	kernel
124147	entrypoint	19	0x98484924bdb0	0xffffb5bef0a3	schedule	T	kernel
124147	entrypoint	21	0x98484924bdc0	0xffffb5bef331	schedule_preempt_disabled	T	kernel
124147	entrypoint	22	0x98484924bdc8	0xffffb5bf05cd	__mutex_lock.constprop.0	t	kernel
124147	entrypoint	36	0x98484924be38	0xffffb5328d52	__lru_add_drain_all	t	kernel
124147	entrypoint	43	0x98484924be70	0xffffb53229c7	generic_fadvise	T	kernel
124147	entrypoint	58	0x98484924bee8	0xffffb5322ad8	ksys_fadvise64_64	T	kernel
124147	entrypoint	65	0x98484924bf20	0xffffb5322b78	__x64_sys_fadvise64	T	kernel
124147	entrypoint	66	0x98484924bf28	0xffffb5be596b	do_syscall_64	T	kernel
124147	entrypoint	71	0x98484924bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124148	entrypoint	6	0x9848491ebcb0	0xffffb5beeda7	__schedule	t	kernel
124148	entrypoint	19	0x9848491ebd18	0xffffb5bef0a3	schedule	T	kernel
124148	entrypoint	21	0x9848491ebd28	0xffffb51d9505	futex_wait_queue	T	kernel
124148	entrypoint	26	0x9848491ebd50	0xffffb51d9bc8	__futex_wait	T	kernel
124148	entrypoint	35	0x9848491ebd98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124148	entrypoint	50	0x9848491ebe10	0xffffb51d9cb4	futex_wait	T	kernel
124148	entrypoint	68	0x9848491ebea0	0xffffb51d5f08	do_futex	T	kernel
124148	entrypoint	70	0x9848491ebeb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124148	entrypoint	85	0x9848491ebf28	0xffffb5be596b	do_syscall_64	T	kernel
124148	entrypoint	90	0x9848491ebf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124149	entrypoint	6	0x984849253cb0	0xffffb5beeda7	__schedule	t	kernel
124149	entrypoint	19	0x984849253d18	0xffffb5bef0a3	schedule	T	kernel
124149	entrypoint	21	0x984849253d28	0xffffb51d9505	futex_wait_queue	T	kernel
124149	entrypoint	26	0x984849253d50	0xffffb51d9bc8	__futex_wait	T	kernel
124149	entrypoint	35	0x984849253d98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124149	entrypoint	50	0x984849253e10	0xffffb51d9cb4	futex_wait	T	kernel
124149	entrypoint	53	0x984849253e28	0xffffb5261d97	trace_call_bpf	T	kernel
124149	entrypoint	68	0x984849253ea0	0xffffb51d5f08	do_futex	T	kernel
124149	entrypoint	70	0x984849253eb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124149	entrypoint	85	0x984849253f28	0xffffb5be596b	do_syscall_64	T	kernel
124149	entrypoint	90	0x984849253f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124150	entrypoint	6	0x984849143cb0	0xffffb5beeda7	__schedule	t	kernel
124150	entrypoint	19	0x984849143d18	0xffffb5bef0a3	schedule	T	kernel
124150	entrypoint	21	0x984849143d28	0xffffb51d9505	futex_wait_queue	T	kernel
124150	entrypoint	26	0x984849143d50	0xffffb51d9bc8	__futex_wait	T	kernel
124150	entrypoint	35	0x984849143d98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124150	entrypoint	50	0x984849143e10	0xffffb51d9cb4	futex_wait	T	kernel
124150	entrypoint	56	0x984849143e40	0xffffb51b98c0	hrtimer_wakeup	t	kernel
124150	entrypoint	68	0x984849143ea0	0xffffb51d5f08	do_futex	T	kernel
124150	entrypoint	70	0x984849143eb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124150	entrypoint	85	0x984849143f28	0xffffb5be596b	do_syscall_64	T	kernel
124150	entrypoint	90	0x984849143f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124151	entrypoint	6	0x9848491cbcb0	0xffffb5beeda7	__schedule	t	kernel
124151	entrypoint	19	0x9848491cbd18	0xffffb5bef0a3	schedule	T	kernel
124151	entrypoint	21	0x9848491cbd28	0xffffb51d9505	futex_wait_queue	T	kernel
124151	entrypoint	26	0x9848491cbd50	0xffffb51d9bc8	__futex_wait	T	kernel
124151	entrypoint	35	0x9848491cbd98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124151	entrypoint	50	0x9848491cbe10	0xffffb51d9cb4	futex_wait	T	kernel
124151	entrypoint	53	0x9848491cbe28	0xffffb5261d97	trace_call_bpf	T	kernel
124151	entrypoint	68	0x9848491cbea0	0xffffb51d5f08	do_futex	T	kernel
124151	entrypoint	70	0x9848491cbeb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124151	entrypoint	85	0x9848491cbf28	0xffffb5be596b	do_syscall_64	T	kernel
124151	entrypoint	90	0x9848491cbf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124152	entrypoint	6	0x9848491c3d48	0xffffb5beeda7	__schedule	t	kernel
124152	entrypoint	19	0x9848491c3db0	0xffffb5bef0a3	schedule	T	kernel
124152	entrypoint	21	0x9848491c3dc0	0xffffb5bef331	schedule_preempt_disabled	T	kernel
124152	entrypoint	22	0x9848491c3dc8	0xffffb5bf05cd	__mutex_lock.constprop.0	t	kernel
124152	entrypoint	36	0x9848491c3e38	0xffffb5328d52	__lru_add_drain_all	t	kernel
124152	entrypoint	43	0x9848491c3e70	0xffffb53229c7	generic_fadvise	T	kernel
124152	entrypoint	49	0x9848491c3ea0	0xffffb51d5e8e	do_futex	T	kernel
124152	entrypoint	58	0x9848491c3ee8	0xffffb5322ad8	ksys_fadvise64_64	T	kernel
124152	entrypoint	65	0x9848491c3f20	0xffffb5322b78	__x64_sys_fadvise64	T	kernel
124152	entrypoint	66	0x9848491c3f28	0xffffb5be596b	do_syscall_64	T	kernel
124152	entrypoint	71	0x9848491c3f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124153	entrypoint	6	0x98484921bcb0	0xffffb5beeda7	__schedule	t	kernel
124153	entrypoint	19	0x98484921bd18	0xffffb5bef0a3	schedule	T	kernel
124153	entrypoint	21	0x98484921bd28	0xffffb51d9505	futex_wait_queue	T	kernel
124153	entrypoint	26	0x98484921bd50	0xffffb51d9bc8	__futex_wait	T	kernel
124153	entrypoint	35	0x98484921bd98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124153	entrypoint	50	0x98484921be10	0xffffb51d9cb4	futex_wait	T	kernel
124153	entrypoint	55	0x98484921be38	0xffffb5bef0a3	schedule	T	kernel
124153	entrypoint	57	0x98484921be48	0xffffb5bf4f74	do_nanosleep	t	kernel
124153	entrypoint	58	0x98484921be50	0xffffb531471d	rseq_get_rseq_cs	t	kernel
124153	entrypoint	68	0x98484921bea0	0xffffb51d5f08	do_futex	T	kernel
124153	entrypoint	70	0x98484921beb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124153	entrypoint	85	0x98484921bf28	0xffffb5be596b	do_syscall_64	T	kernel
124153	entrypoint	90	0x98484921bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124156	entrypoint	6	0x98484925bd48	0xffffb5beeda7	__schedule	t	kernel
124156	entrypoint	19	0x98484925bdb0	0xffffb5bef0a3	schedule	T	kernel
124156	entrypoint	21	0x98484925bdc0	0xffffb5bef331	schedule_preempt_disabled	T	kernel
124156	entrypoint	22	0x98484925bdc8	0xffffb5bf05cd	__mutex_lock.constprop.0	t	kernel
124156	entrypoint	27	0x98484925bdf0	0xffffb53299f0	lru_add	t	kernel
124156	entrypoint	36	0x98484925be38	0xffffb5328d52	__lru_add_drain_all	t	kernel
124156	entrypoint	43	0x98484925be70	0xffffb53229c7	generic_fadvise	T	kernel
124156	entrypoint	49	0x98484925bea0	0xffffb51d5e8e	do_futex	T	kernel
124156	entrypoint	58	0x98484925bee8	0xffffb5322ad8	ksys_fadvise64_64	T	kernel
124156	entrypoint	65	0x98484925bf20	0xffffb5322b78	__x64_sys_fadvise64	T	kernel
124156	entrypoint	66	0x98484925bf28	0xffffb5be596b	do_syscall_64	T	kernel
124156	entrypoint	71	0x98484925bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124313	entrypoint	6	0x98484938bcb0	0xffffb5beeda7	__schedule	t	kernel
124313	entrypoint	19	0x98484938bd18	0xffffb5bef0a3	schedule	T	kernel
124313	entrypoint	21	0x98484938bd28	0xffffb51d9505	futex_wait_queue	T	kernel
124313	entrypoint	26	0x98484938bd50	0xffffb51d9bc8	__futex_wait	T	kernel
124313	entrypoint	35	0x98484938bd98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124313	entrypoint	50	0x98484938be10	0xffffb51d9cb4	futex_wait	T	kernel
124313	entrypoint	55	0x98484938be38	0xffffb51d90f5	futex_wake	T	kernel
124313	entrypoint	58	0x98484938be50	0xffffb531471d	rseq_get_rseq_cs	t	kernel
124313	entrypoint	68	0x98484938bea0	0xffffb51d5f08	do_futex	T	kernel
124313	entrypoint	70	0x98484938beb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124313	entrypoint	85	0x98484938bf28	0xffffb5be596b	do_syscall_64	T	kernel
124313	entrypoint	90	0x98484938bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124314	entrypoint	6	0x984849383cb0	0xffffb5beeda7	__schedule	t	kernel
124314	entrypoint	8	0x984849383cc0	0xffffb5454d9a	touch_atime	T	kernel
124314	entrypoint	19	0x984849383d18	0xffffb5bef0a3	schedule	T	kernel
124314	entrypoint	21	0x984849383d28	0xffffb51d9505	futex_wait_queue	T	kernel
124314	entrypoint	26	0x984849383d50	0xffffb51d9bc8	__futex_wait	T	kernel
124314	entrypoint	35	0x984849383d98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124314	entrypoint	50	0x984849383e10	0xffffb51d9cb4	futex_wait	T	kernel
124314	entrypoint	58	0x984849383e50	0xffffb531471d	rseq_get_rseq_cs	t	kernel
124314	entrypoint	59	0x984849383e58	0xffffb542c76c	vfs_read	T	kernel
124314	entrypoint	68	0x984849383ea0	0xffffb51d5f08	do_futex	T	kernel
124314	entrypoint	70	0x984849383eb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124314	entrypoint	85	0x984849383f28	0xffffb5be596b	do_syscall_64	T	kernel
124314	entrypoint	90	0x984849383f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124315	entrypoint	0	0x98484968bcb0	0xffffb5beeda7	__schedule	t	kernel
124315	entrypoint	13	0x98484968bd18	0xffffb5bef0a3	schedule	T	kernel
124315	entrypoint	15	0x98484968bd28	0xffffb51d9505	futex_wait_queue	T	kernel
124315	entrypoint	20	0x98484968bd50	0xffffb51d9bc8	__futex_wait	T	kernel
124315	entrypoint	29	0x98484968bd98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124315	entrypoint	44	0x98484968be10	0xffffb51d9cb4	futex_wait	T	kernel
124315	entrypoint	48	0x98484968be30	0xffffb5405315	__memcg_slab_free_hook	T	kernel
124315	entrypoint	62	0x98484968bea0	0xffffb51d5f08	do_futex	T	kernel
124315	entrypoint	64	0x98484968beb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124315	entrypoint	79	0x98484968bf28	0xffffb5be596b	do_syscall_64	T	kernel
124315	entrypoint	84	0x98484968bf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124316	entrypoint	2	0x9848493fbc90	0xffffb515fc35	psi_task_switch	T	kernel
124316	entrypoint	12	0x9848493fbce0	0xffffb5beeda7	__schedule	t	kernel
124316	entrypoint	14	0x9848493fbcf0	0xffffb51b9575	enqueue_hrtimer	t	kernel
124316	entrypoint	25	0x9848493fbd48	0xffffb5bef0a3	schedule	T	kernel
124316	entrypoint	27	0x9848493fbd58	0xffffb5bf4dcc	schedule_hrtimeout_range_clock	T	kernel
124316	entrypoint	33	0x9848493fbd88	0xffffb51b98c0	hrtimer_wakeup	t	kernel
124316	entrypoint	43	0x9848493fbdd8	0xffffb5497598	ep_poll	t	kernel
124316	entrypoint	53	0x9848493fbe28	0xffffb54965e0	ep_autoremove_wake_function	t	kernel
124316	entrypoint	63	0x9848493fbe78	0xffffb54976b8	do_epoll_wait	t	kernel
124316	entrypoint	70	0x9848493fbeb0	0xffffb549780c	do_compat_epoll_pwait.part.0	t	kernel
124316	entrypoint	72	0x9848493fbec0	0xffffb5497bff	__x64_sys_epoll_pwait	T	kernel
124316	entrypoint	85	0x9848493fbf28	0xffffb5be596b	do_syscall_64	T	kernel
124316	entrypoint	90	0x9848493fbf50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124317	entrypoint	6	0x984849663d48	0xffffb5beeda7	__schedule	t	kernel
124317	entrypoint	19	0x984849663db0	0xffffb5bef0a3	schedule	T	kernel
124317	entrypoint	21	0x984849663dc0	0xffffb5bef331	schedule_preempt_disabled	T	kernel
124317	entrypoint	22	0x984849663dc8	0xffffb5bf05cd	__mutex_lock.constprop.0	t	kernel
124317	entrypoint	36	0x984849663e38	0xffffb5328d52	__lru_add_drain_all	t	kernel
124317	entrypoint	43	0x984849663e70	0xffffb53229c7	generic_fadvise	T	kernel
124317	entrypoint	49	0x984849663ea0	0xffffb51d5e8e	do_futex	T	kernel
124317	entrypoint	58	0x984849663ee8	0xffffb5322ad8	ksys_fadvise64_64	T	kernel
124317	entrypoint	65	0x984849663f20	0xffffb5322b78	__x64_sys_fadvise64	T	kernel
124317	entrypoint	66	0x984849663f28	0xffffb5be596b	do_syscall_64	T	kernel
124317	entrypoint	71	0x984849663f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124318	entrypoint	6	0x9848496ffd48	0xffffb5beeda7	__schedule	t	kernel
124318	entrypoint	19	0x9848496ffdb0	0xffffb5bef0a3	schedule	T	kernel
124318	entrypoint	21	0x9848496ffdc0	0xffffb5bef331	schedule_preempt_disabled	T	kernel
124318	entrypoint	22	0x9848496ffdc8	0xffffb5bf05cd	__mutex_lock.constprop.0	t	kernel
124318	entrypoint	36	0x9848496ffe38	0xffffb5328d52	__lru_add_drain_all	t	kernel
124318	entrypoint	43	0x9848496ffe70	0xffffb53229c7	generic_fadvise	T	kernel
124318	entrypoint	49	0x9848496ffea0	0xffffb51d5e8e	do_futex	T	kernel
124318	entrypoint	58	0x9848496ffee8	0xffffb5322ad8	ksys_fadvise64_64	T	kernel
124318	entrypoint	65	0x9848496fff20	0xffffb5322b78	__x64_sys_fadvise64	T	kernel
124318	entrypoint	66	0x9848496fff28	0xffffb5be596b	do_syscall_64	T	kernel
124318	entrypoint	71	0x9848496fff50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel
124319	entrypoint	6	0x984849707cb0	0xffffb5beeda7	__schedule	t	kernel
124319	entrypoint	19	0x984849707d18	0xffffb5bef0a3	schedule	T	kernel
124319	entrypoint	21	0x984849707d28	0xffffb51d9505	futex_wait_queue	T	kernel
124319	entrypoint	26	0x984849707d50	0xffffb51d9bc8	__futex_wait	T	kernel
124319	entrypoint	35	0x984849707d98	0xffffb51d8ec0	futex_wake_mark	T	kernel
124319	entrypoint	50	0x984849707e10	0xffffb51d9cb4	futex_wait	T	kernel
124319	entrypoint	57	0x984849707e48	0xffffb5bf653a	_raw_spin_unlock	T	kernel
124319	entrypoint	58	0x984849707e50	0xffffb531471d	rseq_get_rseq_cs	t	kernel
124319	entrypoint	68	0x984849707ea0	0xffffb51d5f08	do_futex	T	kernel
124319	entrypoint	70	0x984849707eb0	0xffffb51d63e3	__x64_sys_futex	T	kernel
124319	entrypoint	85	0x984849707f28	0xffffb5be596b	do_syscall_64	T	kernel
124319	entrypoint	90	0x984849707f50	0xffffb5c0012b	entry_SYSCALL_64_after_hwframe	T	kernel

It's a lot of raw data, which doesn't necessarily mean anything of significance. I thought I'd share it with you, perhaps you'll have an idea why are the tasks still hung even though it seems like the mutex has no owner? I do understand that this might be a highly contented mutex as well, but again, our code does not modify the underlying filesystem, which I don't believe should trigger a page cache flush to disk.

gmwiz avatar Sep 01 '25 16:09 gmwiz

@abuehaze14 Have you had the chance of looking into my last comments? Are they at all helpful?

gmwiz avatar Sep 23 '25 10:09 gmwiz

Hi @gmwiz,

I apologize for the delay on this issue, I tried to look for possible commits that would contribute to the reported contention on the lru_drain mutex_lock but so far I couldn't find specific culprit.

It will be good if you can attach the kernel crash dump to a support ticket. If there are steps that we can take to reproduce this issue on our side , please share these steps with us.

abuehaze14 avatar Sep 23 '25 10:09 abuehaze14

Hi @gmwiz,

Have you managed to open a support case for this issue? Is there any chance that you can share your reproduction steps so we can debug this further on our side?

abuehaze14 avatar Dec 08 '25 16:12 abuehaze14

fwiw I was experiencing a similar problem and https://lore.kernel.org/all/[email protected]/ (backported in 6.12.54) fixed it

gemorin avatar Dec 08 '25 17:12 gemorin

Hi @gemorin, Thank you for the update. @gmwiz Please let us know if you are still facing the issue on our latest AL2023 6.12 kernel which should be kernel-6.12.58-82.121.amzn2023 as of today.

abuehaze14 avatar Dec 08 '25 19:12 abuehaze14

Hi @abuehaze14 Thank you very much for following up. I did not eventually open a support ticket as the issue happend very rarely, and did not reproduce in roughly the past two months. Seems like it might be fixed in upstream. So far so good.

Thanks again, and I believe we can close this issue for now.

gmwiz avatar Dec 08 '25 20:12 gmwiz

Hi @gmwiz, Thank you for the update! We are glad that the issue has been fixed for you.

abuehaze14 avatar Dec 09 '25 11:12 abuehaze14