What happened?

In 0.7.10 and latest, ebpf module load crashed with the following trace. It looks the root cause is unknown func bpf_perf_event_read_value#55

This doesn't happen in 0.7.8 and older.

libbpf: prog 'kepler_sched_switch_trace': -- BEGIN PROG LOAD LOG --
; if (SAMPLE_RATE > 0) {
0: (18) r2 = 0xffff9fa324c082d0
2: (61) r2 = *(u32 *)(r2 +0)
 R1=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=12,imm=0) R10=fp0
3: (67) r2 <<= 32
4: (c7) r2 s>>= 32
5: (b7) r4 = 1
; if (SAMPLE_RATE > 0) {
6: (6d) if r4 s> r2 goto pc+13
last_idx 6 first_idx 0
regs=10 stack=0 before 5: (b7) r4 = 1
last_idx 6 first_idx 0
regs=4 stack=0 before 5: (b7) r4 = 1
regs=4 stack=0 before 4: (c7) r2 s>>= 32
regs=4 stack=0 before 3: (67) r2 <<= 32
regs=4 stack=0 before 2: (61) r2 = *(u32 *)(r2 +0)
; prev_pid = ctx->prev_pid;
        app.kubernetes.io/name: kepler-exporter
20: (61) r1 = *(u32 *)(r1 +24)
21: (7b) *(u64 *)(r10 -160) = r1
; prev_pid = ctx->prev_pid;
22: (63) *(u32 *)(r10 -20) = r1
; pid_tgid = bpf_get_current_pid_tgid();
23: (85) call bpf_get_current_pid_tgid#14
24: (bf) r6 = r0
; cur_pid = pid_tgid & 0xffffffff;
25: (63) *(u32 *)(r10 -28) = r6
; cgroup_id = bpf_get_current_cgroup_id();
26: (85) call bpf_get_current_cgroup_id#80
27: (7b) *(u64 *)(r10 -184) = r0
; cpu_id = bpf_get_smp_processor_id();
28: (85) call bpf_get_smp_processor_id#8
29: (bf) r9 = r0
; cpu_id = bpf_get_smp_processor_id();
30: (63) *(u32 *)(r10 -24) = r9
; cur_ts = bpf_ktime_get_ns();
31: (85) call bpf_ktime_get_ns#5
32: (bf) r8 = r0
33: (b7) r7 = 0
; struct bpf_perf_event_value c = {};
34: (7b) *(u64 *)(r10 -128) = r7
last_idx 34 first_idx 32
regs=80 stack=0 before 33: (b7) r7 = 0
35: (7b) *(u64 *)(r10 -136) = r7
36: (7b) *(u64 *)(r10 -144) = r7
; &cpu_cycles_event_reader, *cpu_id, &c, sizeof(c));
37: (67) r9 <<= 32
38: (77) r9 >>= 32
39: (bf) r3 = r10
; prev_pid = ctx->prev_pid;
40: (07) r3 += -144
; error = bpf_perf_event_read_value(
41: (18) r1 = 0xffff9f8bacc80000
43: (bf) r2 = r9
44: (b7) r4 = 24
45: (85) call bpf_perf_event_read_value#55
unknown func bpf_perf_event_read_value#55
processed 31 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1

What did you expect to happen?

This happens in 0.7.10 and latest.

How can we reproduce it (as minimally and precisely as possible)?

It happens on ubuntu 5.4 kernels

Anything else we need to know?

No response

Kepler image tag

0.7.10

Kubernetes version

$ kubectl version
# paste output here

v1.27.3

Cloud provider or bare metal

kind

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

5.4.0-164-generic #181-Ubuntu SMP

Install tools

Kepler deployment config

For on kubernetes:

$ KEPLER_NAMESPACE=kepler

# provide kepler configmap
$ kubectl get configmap kepler-cfm -n ${KEPLER_NAMESPACE} 
# paste output here

# provide kepler deployment description
$ kubectl describe deployment kepler-exporter -n ${KEPLER_NAMESPACE}

For standalone:

put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

May 31 '24 14:05 rootfs

here is the explanation

Jun 01 '24 23:06 rootfs

I will take this up.

Jun 02 '24 23:06 sthaha

See: https://github.com/sustainable-computing-io/kepler/pull/1398

With these changes applied the minimum supported kernel version for Kepler is 5.12 due to:

bpf_read_perf_event_value - which is available in tracepoint contexts in 5.12 bpf fentry/fexit programs - which added in 5.11

I think this is a pretty reasonable trade off if you read the man page of bpf_perf_event_read_value

If you really want 5.4 then we can discuss that as it's not trivial.

Jun 03 '24 10:06 dave-tucker

The only reference to kernel requirements I could find in the documentation appears out of date - https://sustainable-computing.io/installation/strategy/. I raised an issue (#1866) that is occurring due to running kernel v5.4 on hosts.

Please clearly document these requirements on a release-by-release basis.

May 20 '25 08:05 Robbie558

kepler
kepler copied to clipboard

unknown func bpf_perf_event_read_value#55 in eBPF module since 0.7.10

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kepler image tag

Kubernetes version

Cloud provider or bare metal

OS version

Install tools

Kepler deployment config

put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

kepler kepler copied to clipboard

unknown func bpf_perf_event_read_value#55 in eBPF module since 0.7.10

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kepler image tag

Kubernetes version

Cloud provider or bare metal

OS version

Install tools

Kepler deployment config

put your Kepler command argument here

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

kepler
kepler copied to clipboard