pyroscope Kernel panic

I have kernel panic almost every day on Talos Linux v1.8.2 (Linux 6.6.58). Talos is deployed on bare metal nodes (Dell R6615) with NVMe SSD. For the network, I use Broadcom 2x25G (50G in LACP bonding) with MTU 9000 (jumbo frame).

I use an image built on factory.talos.dev:

customization:
    extraKernelArgs:
        - console=ttyS0,115200n8r
        - -lockdown
        - lockdown=integrity
        - cpufreq.default_governor=performance
        - amd_pstate=active
        - mitigations=off
        - iommu=off
    systemExtensions:
        officialExtensions:
            - siderolabs/amd-ucode
            - siderolabs/amdgpu-firmware
            - siderolabs/drbd

For CNI I use Cilium in eBPF mode.

[40145.614353] general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
[40145.624361] CPU: 18 PID: 234918 Comm: conn48291 Tainted: G           O       6.6.58-talos #1
[40145.632800] Hardware name: Dell Inc. PowerEdge R6615/067N9T, BIOS 1.9.5 09/12/2024
[40145.640376] RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
[40145.645609] Code: 90 90 0f 1f 44 00 00 65 48 8b 04 25 80 e3 02 00 48 83 b8 30 0b 00 00 00 74 60 48 8b 80 30 0b 00 00 48 8b 50 30 48 85 d2 74 50 <80> 3a 55 b8 01 00 00 00 74 1b 48 8b 8f 88 00 00 00 48 83 f9 33 74
[40145.664366] RSP: 0018:ffffc900007c8bc8 EFLAGS: 00010082
[40145.669599] RAX: ffff88813eafb120 RBX: ffffc900007c8c20 RCX: 00007f116e206296
[40145.676740] RDX: 9e759c37ee555c76 RSI: 0000000000000001 RDI: ffffc90111fa3f58
[40145.683880] RBP: ffffc90111fa3f58 R08: 000000000002aee0 R09: 0000000000000008
[40145.691021] R10: ffffc90111fa0000 R11: ffffc900007c8ff8 R12: 0000000000000000
[40145.698162] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[40145.705303] FS:  00007f113e959700(0000) GS:ffff88defb500000(0000) knlGS:0000000000000000
[40145.713398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40145.719155] CR2: 000015b40194c804 CR3: 0000000363b74003 CR4: 0000000000f70ee0
[40145.726294] PKRU: 55555554
[40145.729014] Call Trace:
[40145.731468]  <IRQ>
[40145.733502]  ? die_addr+0x36/0x90
[40145.736836]  ? exc_general_protection+0x217/0x420
[40145.741553]  ? asm_exc_general_protection+0x26/0x30
[40145.746450]  ? is_uprobe_at_func_entry+0x28/0x80
[40145.751083]  perf_callchain_user+0x20a/0x360
[40145.755365]  get_perf_callchain+0x147/0x1d0
[40145.759559]  bpf_get_stackid+0x60/0x90
[40145.763319]  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
[40145.769333]  ? __smp_call_single_queue+0xad/0x120
[40145.774049]  bpf_overflow_handler+0x75/0x110
[40145.778330]  __perf_event_overflow+0x114/0x360
[40145.782787]  perf_swevent_hrtimer+0x134/0x150
[40145.787155]  ? __wake_up_common+0x73/0x180
[40145.791258]  ? timerqueue_del+0x2e/0x50
[40145.795107]  ? __pfx_perf_swevent_hrtimer+0x10/0x10
[40145.799996]  __hrtimer_run_queues+0x118/0x240
[40145.804365]  ? ktime_get_update_offsets_now+0x49/0x110
[40145.809511]  hrtimer_interrupt+0xf8/0x240
[40145.813531]  __sysvec_apic_timer_interrupt+0x4a/0xe0
[40145.818508]  sysvec_apic_timer_interrupt+0x6d/0x90
[40145.823310]  </IRQ>
[40145.825426]  <TASK>
[40145.827537]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[40145.832687] RIP: 0010:__kmem_cache_free+0x1cb/0x350
[40145.837576] Code: 48 85 db 0f 84 00 01 00 00 48 89 c2 48 0f ca 49 33 94 24 b8 00 00 00 48 89 10 49 8b 04 24 65 48 03 05 99 bd 37 61 48 8b 70 08 <4c> 39 68 10 0f 85 0b 01 00 00 48 8b 10 41 8b 44 24 28 48 01 d8 48
[40145.856331] RSP: 0018:ffffc90111fa3b70 EFLAGS: 00000282
[40145.861561] RAX: ffff88defb533910 RBX: ffff88813eafb120 RCX: ffffea0000000000
[40145.868698] RDX: 9e759c37ee555c76 RSI: 0000000000119862 RDI: ffff88810004e200
[40145.875836] RBP: ffffc90111fa3bc0 R08: 0000000000000086 R09: 00007f1153f9f9c0
[40145.882980] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810004e200
[40145.890120] R13: ffffea0004fabec0 R14: 0000000000000000 R15: 0000000000000000
[40145.897266]  ? uprobe_free_utask+0x62/0x80
[40145.901378]  ? acct_collect+0x4c/0x220
[40145.905141]  uprobe_free_utask+0x62/0x80
[40145.909075]  mm_release+0x12/0xb0
[40145.912401]  do_exit+0x26b/0xaa0
[40145.915643]  __x64_sys_exit+0x1b/0x20
[40145.919317]  do_syscall_64+0x5a/0x80
[40145.922911]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[40145.927976] RIP: 0033:0x7f116e206296
[40145.931565] Code: 28 06 00 00 0f 84 ec 01 00 00 48 8b 44 24 08 f6 80 08 03 00 00 40 0f 85 7a 01 00 00 ba 3c 00 00 00 0f 1f 00 31 ff 89 d0 0f 05 <eb> f8 48 89 c8 48 c7 00 00 00 00 00 48 8d 48 f8 48 39 d0 75 ed 48
[40145.950321] RSP: 002b:00007f113e958a40 EFLAGS: 00000246 ORIG_RAX: 000000000000003c
[40145.957891] RAX: ffffffffffffffda RBX: 00007f113e859000 RCX: 00007f116e206296
[40145.965033] RDX: 000000000000003c RSI: 00007f1153f9f9c0 RDI: 0000000000000000
[40145.972177] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000056b90006
[40145.979317] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1149f8925e
[40145.986456] R13: 00007f1149f8925f R14: 00007f113e959700 R15: 00007f113e958b00
[40145.993606]  </TASK>
[40145.995808] Modules linked in: drbd_transport_tcp(O) drbd(O) ahci i40e sp5100_tco bnxt_en amd64_edac megaraid_sas libahci nvme k10temp watchdog
[40146.008673] ---[ end trace 0000000000000000 ]---
[40146.013298] RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
[40146.018531] Code: 90 90 0f 1f 44 00 00 65 48 8b 04 25 80 e3 02 00 48 83 b8 30 0b 00 00 00 74 60 48 8b 80 30 0b 00 00 48 8b 50 30 48 85 d2 74 50 <80> 3a 55 b8 01 00 00 00 74 1b 48 8b 8f 88 00 00 00 48 83 f9 33 74
[40146.037290] RSP: 0018:ffffc900007c8bc8 EFLAGS: 00010082
[40146.042521] RAX: ffff88813eafb120 RBX: ffffc900007c8c20 RCX: 00007f116e206296
[40146.049662] RDX: 9e759c37ee555c76 RSI: 0000000000000001 RDI: ffffc90111fa3f58
[40146.056805] RBP: ffffc90111fa3f58 R08: 000000000002aee0 R09: 0000000000000008
[40146.063946] R10: ffffc90111fa0000 R11: ffffc900007c8ff8 R12: 0000000000000000
[40146.071088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[40146.078227] FS:  00007f113e959700(0000) GS:ffff88defb500000(0000) knlGS:0000000000000000
[40146.086321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40146.092077] CR2: 000015b40194c804 CR3: 0000000363b74003 CR4: 0000000000f70ee0
[40146.099222] PKRU: 55555554
[40146.101943] Kernel panic - not syncing: Fatal exception in interrupt
[40146.108739] Kernel Offset: disabled
[40146.112246] Rebooting in 10 seconds..

Nov 12 '24 09:11 maxpain

Seems like a kernel/ebpf subsystem bug. Do you use pyroscope.ebpf component in alloy as profiler? Or is it coroot profiling? Which version? Which configuration?

bpf_prog_9aac297fb833e2f5_do_perf_event suggests it could be pyroscope either or coroot

Will you be able to share your kernel+modules so we could try reproducing it?

Nov 18 '24 04:11 korniltsev

Or is it coroot profiling?

Yes, It's coroot. We deploy coroot using the official helm chart with default configuration. Version of helm chart: 0.15.16

Will you be able to share your kernel+modules so we could try reproducing it?

I use Talos v1.8.2 (Linux 6.6.58) built on factory.talos.dev with following configuration:

customization:
    extraKernelArgs:
        - console=ttyS0,115200n8r
        - -lockdown
        - lockdown=integrity
        - cpufreq.default_governor=performance
        - amd_pstate=active
        - mitigations=off
        - iommu=off
    systemExtensions:
        officialExtensions:
            - siderolabs/amd-ucode
            - siderolabs/amdgpu-firmware
            - siderolabs/drbd

https://factory.talos.dev/image/c4402c8cf9c87bcdc3947f2cc6e9486f413ca69716fa3b0a4c0c9863aafe963f/v1.8.2/metal-amd64-secureboot.iso

Nov 19 '24 08:11 maxpain

@maxpain would you be able to test a kernel patch?

Jan 09 '25 10:01 borkmann

@maxpain would you be able to test a kernel patch?

It would not be easy since I built Talos Linux images using factory.talos.dev and Secure Boot.

I think the panic could be caused by Puppeteer (chrome for developers) pods in my cluster. I can reproduce it on another hardware.

Jan 09 '25 10:01 maxpain

Would you be able to test this one?

From e73a85a3fc1753656aba6d365640b16dca432ae1 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <[email protected]>
Date: Thu, 9 Jan 2025 09:01:59 +0000
Subject: [PATCH] events: Fix GPF due to corrupted utask->auprobe pointer

Fixes: cfa7f3d2c526 ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://github.com/grafana/pyroscope/issues/3673
---
 arch/x86/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c75c482d4c52..05f9cedf2691 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2835,6 +2835,8 @@ static bool is_uprobe_at_func_entry(struct pt_regs *regs)

        if (!current->utask)
                return false;
+       if (!current->utask->active_uprobe)
+               return false;

        auprobe = current->utask->auprobe;
        if (!auprobe)
-- 
2.43.0

Jan 09 '25 10:01 borkmann

I can reproduce with following running in separate terminals

# while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
# bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

looks like we have a fix already, will send out shortly

Jan 09 '25 13:01 olsajiri

@olsajiri found a better variant which does not incur an additional runtime check, got submitted here: https://lore.kernel.org/bpf/[email protected]/

Jan 09 '25 14:01 borkmann

We are aware that lts kernel versions 6.1.y starting with 6.1.113 and 6.6.y starting with 6.6.55 have this uprobe regression.

I proposed adding the fix to the stable kernel: https://lore.kernel.org/stable/[email protected]/T/#u

Those versions are also in running in some EKS and GKE nodes. I am working to let the respective teams know and fix this.

Mar 05 '25 10:03 simonswine

pyroscope pyroscope copied to clipboard

Kernel panic

pyroscope
pyroscope copied to clipboard