opentelemetry-go-instrumentation
opentelemetry-go-instrumentation copied to clipboard
CreateHeaderFields probe fails with permission denied in v0.2.2-alpha
Describe the bug
Running registry.gitlab.com/gitlab-org/gitlab-runner:alpine-v16.2.0 with go autoinstrumentation causes the instrumentation container to crashloop. This issue doesn't occur with v0.2.1-alpha. Another workload on the same node (ghcr.io/toboshii/hajimari:v0.3.1) does not have this issue.
Environment
- OS: Oracle Linux Server 8.7 arm64
- Version: autoinstrumentation-go:v0.2.2-alpha
To Reproduce
Steps to reproduce the behavior:
- Deploy OTel Operator v0.81.0 with autoinstrumentation-go:v0.2.2-alpha
- Add autoinstrumentation annotations to target deployment
- Observe crashloop of opentelemetry-auto-instrumentation container with the following error
- Revert to autoinstrumentation-go:v0.2.1-alpha and observe working instrumentation
{
"level":"error",
"ts":1690190562.3462915,
"caller":"instrumentors/runner.go:88",
"msg":"error while loading instrumentors, cleaning up",
"name":"google.golang.org/grpc",
"error":"field UprobeHttp2ClientCreateHeaderFields: program uprobe_Http2Client_CreateHeaderFields: load program: permission denied: ; u32 random = bpf_get_prandom_u32();: 892: ( (truncated, 1151 line(s) omitted)",
"stacktrace":"go.opentelemetry.io/auto/pkg/instrumentors.(*Manager).load\n\t/app/pkg/instrumentors/runner.go:88\ngo.opentelemetry.io/auto/pkg/instrumentors.(*Manager).Run\n\t/app/pkg/instrumentors/runner.go:36\nmain.main\n\t/app/cli/main.go:86\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"
}
Expected behavior
The app runs and is instrumented, as with v0.2.1-alpha
Additional context
Similar to #78
I've replicated with gitlab-runner:ubuntu-v16.2.0, but can't test on my amd64 nodes (they've crashlooped with unknown func bpf_probe_write_user for a while)
Hello @pl4nty I've also encountered issue you have on your gitlab-runner:ubuntu-v16.2.0 with bpf_probe_write_user bpf helper. It seems that this is caused by the fact that since 5.14-rc6 linux kernel Commit bpf_probe_write_user is locked down for the sake of securtity and better solutions as mentionsed in commit description These days we have better
mechanisms in BPF for achieving the same (e.g. for load-balancers), but
without having to write to userspace memory. I think this should be separate issue as the root cause seems to be different and this bpf particular bpf helper should be retired as this issue forces users to disable lockdown and integrity modes in lsm kernel parameter which is both hard to do in cloud providers VMs and unsecure as far as I know. I will create another issue and link yours.
We added a graceful degradation for the HTTP instrumentation, where it will log an error if the kernel is locked down. We can do the same for gRPC as well.
There is not much more that we can do other than what is being looked at here to address this more comprehensively: https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/290