opentelemetry-go-instrumentation icon indicating copy to clipboard operation
opentelemetry-go-instrumentation copied to clipboard

Reduce pointers usage in TargetDetails

Open RonFed opened this issue 1 year ago • 8 comments

TargetDetails seems to use pointers without a good reason, and it might be related to the out-of-memory error seen in #619. For large binaries analysis, the slices and maps used can get large, and removing the pointers might be an improvement.

RonFed avatar Feb 01 '24 20:02 RonFed

Hi @RonFed. I'm trying to instrument a big executable running in Kubernetes and I got OOM Killed all the time in the opentelemetry-go-instrumentation container. I increased the resource limits a lot but it seems it is not enough. I think I'm facing this issue (I have been doing some debugging).

I would like to contribute but I need some hints. Can you provide some light? Thanks!

iblancasa avatar Jun 24 '24 14:06 iblancasa

Hey @iblancasa, thank you for your interest. What is approximately the memory limit you saw exceeded? This is an interesting topic, and I'd start by profiling memory usage (pprof) in a local setup to get the root cause. The TargetDetails struct looked like a good candidate for the problem but I didn't get the chance to confirm that. Another place that might be relevant is the structfield package which stores an offset mapping of relevant structs for instrumentation.

RonFed avatar Jun 24 '24 15:06 RonFed

What is approximately the memory limit you saw exceeded?

I was trying to do some experiments with the OpenTelemetry Operator to autoinstrument an OpenTelemetry Collector. So... I added 2Gb to the pod as limits and it is OOMKilled. I reduced the size of my collector reducing the number of components and I was able to execute some extra statements in the instrumentation but I was not able to load the probes https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/9882b86f52d8daf168efee68ddc4442d2acd821f/internal/pkg/instrumentation/manager.go#L207-L214

After reading the comments, I think the issue you described here can be related.

Another place that might be relevant is the structfield package which stores an offset mapping of relevant structs for instrumentation.

I agree. But I have been printing the memory usage until reaching these lines: https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/9882b86f52d8daf168efee68ddc4442d2acd821f/internal/pkg/instrumentation/manager.go#L207-L214 And it is around 25MB. After the load is done, the pod is killed by Kubernetes.

iblancasa avatar Jun 25 '24 10:06 iblancasa

@iblancasa Are you setting OTEL_GO_AUTO_SHOW_VERIFIER_LOG env var? I think this can cause large memory allocations as well.

RonFed avatar Jun 25 '24 10:06 RonFed

I'm not setting that environment variable.

iblancasa avatar Jun 25 '24 11:06 iblancasa

I tried to reproduce this. Instrumenting the collector, the max memory allocated by the instrumentation is ~120MB in my setup.

RonFed avatar Jun 27 '24 06:06 RonFed

Oh. Maybe I'm doing something wrong. I'll try again. Thanks!

iblancasa avatar Jul 01 '24 09:07 iblancasa

I just tried again and it seems I reproduce the problem 100% of the time. I'm using a container image based on Fedora. The last log message I see is this:

{"level":"info","ts":1719839515.3163679,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:117","msg":"starting instrumentation..."}
{"level":"info","ts":1719839515.3164241,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:222","msg":"Mounting bpffs","allocations_details":{"StartAddr":140352138248192,"EndAddr":140352138772480,"NumCPU":16}}
{"level":"info","ts":1719839515.3165295,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:208","msg":"loading probe","name":"google.golang.org/grpc/client"}

After that, it is OOKilled. I'll create a separate issue.

iblancasa avatar Jul 01 '24 13:07 iblancasa