otel-profiling-agent icon indicating copy to clipboard operation
otel-profiling-agent copied to clipboard

load ebpf program failed: BPF program is too large. Processed 1000001 insn

Open clxsh opened this issue 5 months ago • 3 comments

Hi team, I'm encountering a program loading error with the latest ebpf-profiler on the 5.4.143 kernel, whereas the issue is not present in the 5.4.250 kernel. I've traced the problem to the pull request, after which each DEBUG_PRINT call seems to expand into more than 20 instructions.

ERRO[0045] BPF program is too large. Processed 1000001 insn
ERRO[0045] processed 1000001 insns (limit 1000000) max_states_per_insn 99 total_states 59623 peak_states 3386 mark_read 201
ERRO[0045] Failed to start agent controller: failed to load eBPF tracer: failed to load eBPF code: failed to load perf eBPF programs: failed to load perf_unwind_native

The kernel I use is an internal version, but I reproduce it on ubuntu 20.04 with linux-image-5.4.0-26-generic kernel. Thanks!

clxsh avatar Aug 14 '25 12:08 clxsh

Thanks for the report. The CI runs per kernel tests on every change and so there is also a test on Integration tests (v5.4.276 amd64).

These are the number of BPF instructions per program:

kprobe/unwind_dotnet has 7584 instructions
perf_event/unwind_dotnet has 7584 instructions
kprobe/go_labels has 679 instructions
perf_event/go_labels has 679 instructions
kprobe/unwind_hotspot has 7615 instructions
perf_event/unwind_hotspot has 7615 instructions
tracepoint/integration/sched_switch has 1247 instructions
kprobe/unwind_stop has 1662 instructions
perf_event/unwind_stop has 1662 instructions
kprobe/unwind_native has 7014 instructions
perf_event/native_tracer_entry has 1222 instructions
perf_event/unwind_native has 7014 instructions
kprobe/dummy has 6 instructions
kprobe/finish_task_switch has 1275 instructions
tracepoint/sched/sched_switch has 65 instructions
kprobe/unwind_perl has 7404 instructions
perf_event/unwind_perl has 7404 instructions
kprobe/unwind_php has 6934 instructions
perf_event/unwind_php has 6934 instructions
kprobe/unwind_python has 5949 instructions
perf_event/unwind_python has 5949 instructions
kprobe/unwind_ruby has 5104 instructions
perf_event/unwind_ruby has 5104 instructions
tracepoint/sched/sched_process_free has 283 instructions
tracepoint/syscalls/sys_enter_bpf has 45 instructions
raw_tracepoint/sys_enter has 56 instructions
kprobe/unwind_v8 has 7860 instructions
perf_event/unwind_v8 has 7860 instructions

But none of these programs exceeds even 10k instructions. It might be a kernel configuration, that is triggering this issue. Can you compare your kernel config, with the ones (here and here) we use for our CI tests.

florianl avatar Aug 14 '25 12:08 florianl

This is a verifier failure, we should add one of the reported failing kernels to our CI.

christos68k avatar Aug 14 '25 12:08 christos68k

@florianl I concat the ci config to ci.config, and copy from ubuntu /boot/boot/config-5.4.0-26-generic to config-5.4.0-26-generic. Then use the diffconfig script to compare them. The output is as follows.

$ python2 diffconfig ci.config config-5.4.0-26-generic > diff.out
$ grep BPF diff.out
-BPF_LIRC_MODE2 y
-BPF_LSM y
-BPF_PRELOAD y
-NET_SCH_BPF y
+BPFILTER_UMH m
+BPF_EVENTS y
+BPF_KPROBE_OVERRIDE y
+HAVE_EBPF_JIT y
+IPV6_SEG6_BPF y
+LWTUNNEL_BPF y
+NETFILTER_XT_MATCH_BPF m
+NET_ACT_BPF m
+NET_CLS_BPF m
+TEST_BPF m

config-5.4.0-26-generic.txt

clxsh avatar Aug 15 '25 02:08 clxsh