otel-profiling-agent Different ways to reduce CPU usage of the profiler

Description

When running the eBPF profiler in OpenTelemetry, I've observed significant spikes in both CPU usage and memory consumption. This increased resource usage is impacting the performance of the system, especially under heavy load. Despite configuring the profiler with default settings, the profiler continues to cause these spikes. I’m looking for potential solutions or adjustments that could help reduce the CPU and memory usage while still providing the necessary profiling functionality. Any advice or suggestions on how to optimize the profiler's resource consumption would be greatly appreciated.

My Environment:

opentelemetry-ebp-profiler
Otel-Collector
Backend

I currently run the profiler with these commands:

sudo ./profiler -collection-agent otel-collector.profiling.svc.cluster.local:4317 -no-kernel-version-check -disable-tls -v

Sep 01 '25 18:09 hanshal101

It is likely that the CPU spikes are related to memory usage via GC. Any attempt to reduce memory usage will likely benefit also CPU. We have currently work in progress in some areas regarding this by interning more things. Several recent commits have helped in this and the current major work in progress PR is #749

Sep 01 '25 18:09 fabled

Please also specify which programming language runtimes you have. Some of the memory usage patterns are specific to the language VM support.

Sep 01 '25 18:09 fabled

Please also specify which programming language runtimes you have. Some of the memory usage patterns are specific to the language VM support.

Thanks for the reply, for our usecase we have multiple pods with different languages like golang, php, node etc. Almost all languages!

Sep 01 '25 18:09 hanshal101

[..] I've observed significant spikes in both CPU usage

As the eBPF profiler also profiles itself, can you share the flamegraph or stacktraces of the moments, when you encounter these CPU spikes?

Sep 01 '25 19:09 florianl

Hello @florianl sorry for the late reply, I was actually trying to deploy pyroscope and collect the framegraph for this, but I am facing various issues on this. I tried their Docker setup since it was pretty easy and good for local development. But I got something for you.

So this is my deployment looks like while it says:

hanshal101@lol:~$ kubectl top pod -n profiling
NAME                         CPU(cores)   MEMORY(bytes)   
my-backend-cc8c4bcc7-h98kv   9m           27Mi            
otel-collector-sd4jk         1m           34Mi            
otel-ebpf-profiler-l2kxm     1m           42Mi

Meaning:

9m = 0.009 of a CPU core (very low usage).
1m = 0.001 of a CPU core.

and for MEMORY

27Mi = about 27 MB RAM in use.
34Mi = ~34 MB.
42Mi = ~42 MB.

Is this what we should expect? like can we improve this? since I use something like this for my configuration of daemonset Are there any scope of improvements here? maybe we could configure this a little bit to improve some performance. Also I open to contribute somewhere here which could be really helpful for the community. I ask this because I have even got CPU and memory percentage with the same config about agent here is the profiler:

Config:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-ebpf-profiler
  namespace: profiling
spec:
  selector:
    matchLabels:
      app: otel-ebpf-profiler-agent
  template:
    metadata:
      labels:
        app: otel-ebpf-profiler-agent
    spec:
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: otel-ebpf-profiler
          image: otel-ebpf-profiler:latest
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
            capabilities:
              add:
                - SYS_ADMIN
                - SYS_RESOURCE
                - SYS_PTRACE
          args:
            - "-collection-agent"
            - "otel-collector.profiling.svc.cluster.local:4317"
            - "-no-kernel-version-check"
            - "-disable-tls"
            - "-v"
          volumeMounts:
            - name: debugfs
              mountPath: /sys/kernel/debug
              readOnly: true
            - name: cgroupfs
              mountPath: /sys/fs/cgroup
              mountPropagation: HostToContainer
              readOnly: true
            - name: procfs
              mountPath: /proc
              mountPropagation: HostToContainer
              readOnly: true
            - name: sys
              mountPath: /sys
              readOnly: true
            - name: modules
              mountPath: /lib/modules
              readOnly: true
          resources:
            requests:
              cpu: "200m"
              memory: "256Mi"
            limits:
              cpu: "1"
              memory: "512Mi"
      volumes:
        - name: sys
          hostPath:
            path: /sys
        - name: modules
          hostPath:
            path: /lib/modules
        - name: debugfs
          hostPath:
            path: /sys/kernel/debug
        - name: cgroupfs
          hostPath:
            path: /sys/fs/cgroup
        - name: procfs
          hostPath:
            path: /proc
      terminationGracePeriodSeconds: 30

Also I am not sure can we get call stack trees of the functions? I did saw previously when I was trying out the profiler with the pyroscope flamegraph? I wonder how we can do this via a custom backend accepting OTLP based profiling events.

Sep 04 '25 13:09 hanshal101

The error I found inside the pyroscope was about the image, it was dropping the profiles received by the collector, i guess.

Sep 04 '25 13:09 hanshal101

@hanshal101 You can use devfiler to visualize flamegraphs that you can then post here. 200-230MB of memory use is not uncommon, especially on loaded K8s Nodes. According to the data you posted, that's still less than 50% of the Pod limit.

Sep 23 '25 09:09 christos68k