opentelemetry-go-instrumentation bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6`

Describe the bug

Since this commit in linux kernel repository, bpf_probe_write_user is locked down and this results in unknown func bpf_probe_write_user error during opentelemetry-go-instrumentation startup on systems with lockdown integrity enabled.

Environment

OS: Ubuntu 22.04LTS Linux - 5.15.0.1045.52-azure, (reproducible on all linux systems with kernel above 5.14 and integrity mode enabled)
Go Version: 1.20
Version: v0.2.1-alpha , v0.2.2-alpha

To Reproduce

Steps to reproduce the behavior:

Build docker image from v0.2.1-alpha/v0.2.2-alpha
Run image on host with kernel version higher that 5.14

Expected behavior

opentelemetry-go-instrumentation starts up without any issues.

Additional context

bpf_probe_write_user should be probably retired from being used in opentelemetry-go-instrumentation as for many cloud providers/on demand use cases editing startup kernel parameters is not possible.

Sep 10 '23 11:09 SzymonSt

Problem also encountered and reported here: #237

Sep 10 '23 11:09 SzymonSt

Yes, this is definitely an issue and we should add code to make sure context propagation is disabled when the Linux security lockdown is set to anything other than [none]. It's really only a problem with context propagation, because it's the only time the bpf_probe_write_user helper is used.

This is typically an issue when SecureBoot is enabled, which is why most users don't see it in regular VM environments. The Linux kernel will automatically enter integrity mode when SecureBoot is there.

I haven't thought through deeply on how we can fix this but one way would be to add an #ifdef around the calls to bpf_probe_write_user and build different versions with the bpf2go at compile time. Then all we need to do is add logic in the userspace to detect kernel >= 5.14 + integrity mode and load the safe version of the bpf probes.

Nov 01 '23 15:11 grcevski

From 3-Mar-24 sig meeting, at @grcevski started a thread on the Linux kernel mailing list to see about unlocking this: https://www.uwsg.indiana.edu/hypermail/linux/kernel/2403.0/03026.html

It doesn't look like that thread has any replies yet, is there any way to follow up with that @grcevski ?

Apr 01 '24 13:04 damemi

From SIG call today:

We originally tried asking about this in Cillium slack, but didn't hear back
Could possibly be fixed by LSM policies (are those usable in cloud providers though?)
We should ask around eBPF/CNCF for advice or help with working on this in the kernel

Apr 02 '24 17:04 damemi

Based on this article, it seems that the lockdown LSM policies are static and cannot be modified or configured:

https://lwn.net/Articles/791863/

Proposals to make them more configurable appears to have been rejected.

Apr 09 '24 19:04 grcevski

Relevant very new patch proposal https://lore.kernel.org/bpf/[email protected]/

Apr 09 '24 20:04 grcevski

Based on the latest comments on the thread I previously posted, it appears that the proposal for the new helper will not get accepted.

However, as of kernel 6.9 (to be released yet) there's a new feature called bpf arena (https://lwn.net/Articles/961594/). It will allow us to declare an arbitrary memory segment that can be shared between the userspace and the bpf program, allowing reads and writes. We'd have to confirm how this works yet, but if we were to make a launcher for a Go program, we can tap into the Go runtime and make any new GC allocated arena segment declared as bpf arena. If this works, we'd have full write access to the Go heap and eliminate the need for using the old helper. This would only work for newer kernels, but it would be a way forward.

Apr 17 '24 13:04 grcevski

Thanks @grcevski for the updates! I mentioned this on the call, but I think it's clear that we need to find an alternative approach. At this point I would consider the old helper DOA

Apr 17 '24 13:04 damemi

I wonder if it would be possible instead to use a uprobe in some place like go_crypto_tls_write and go_crypto_tls_read to sniff the headers as they go along and use a bpf map that is shared to copy the headers around and create relationships to traces. We may also be able to use these probes to at least monitor the related user headers (like any open telemetry key). That's just on the uprobe side, because we want to trap data beyond the crypto boundary. I'm not 100% on the exact probe points, but that's just from memory.

If we need to write into data structures, it may be possible to use a ptrace attached routine to push additional data in. That will require permissions (as the current case) but I don't think it should be prevented by secureboot / lsm (but it is more restrictive because it is properly a debugging interface). That's a bit more work, but I think it is doable.

Aug 20 '24 15:08 apconole

@grcevski @RonFed ^

Aug 20 '24 16:08 MrAlias

I wonder if it would be possible instead to use a uprobe in some place like go_crypto_tls_write and go_crypto_tls_read to sniff the headers as they go along and use a bpf map that is shared to copy the headers around and create relationships to traces.

I apologize if I misunderstood the context, but this already happens, except the current code reads the headers information after the TLS decryption, in a TLS agnostic way. Essentially, he headers are read after they are parsed from the incoming request, regardless of TLS.

If we need to write into data structures, it may be possible to use a ptrace attached routine to push additional data in. That will require permissions (as the current case) but I don't think it should be prevented by secureboot / lsm (but it is more restrictive because it is properly a debugging interface). That's a bit more work, but I think it is doable.

This is a very interesting idea, I don't think it will be limited by secureboot, since we already use ptrace to attach a shared memory segment to the instrumented process. I'd like to find out more about how you think this might work. If I understand correctly what you are saying, it will mean not using eBPF at all to inject the header values, but using a injected function hook to do the work?

Aug 20 '24 19:08 grcevski

This is a very interesting idea, I don't think it will be limited by secureboot, since we already use ptrace to attach a shared memory segment to the instrumented process. I'd like to find out more about how you think this might work. If I understand correctly what you are saying, it will mean not using eBPF at all to inject the header values, but using a injected function hook to do the work?

Yes, exactly.

Aug 21 '24 12:08 apconole

opentelemetry-go-instrumentation opentelemetry-go-instrumentation copied to clipboard

bpf_probe_write_user helper function is locked down since linux kernel `5.14-rc6`

Describe the bug

Environment

To Reproduce

Expected behavior

Additional context

opentelemetry-go-instrumentation
opentelemetry-go-instrumentation copied to clipboard