oci-seccomp-bpf-hook
oci-seccomp-bpf-hook copied to clipboard
Consider using libbpf
If we utilize libbpf, then we can produce a smaller binary which also runs faster and minimizes the runtime dependencies. The overall architecture of the hook could be simplified as well. I created a syscall recorder project for demonstration purposes: https://github.com/saschagrunert/syscall-recorder
Building the main application (the syscall-recorder) requires bpftool, clang, llvm, libbpf, libelf, libz and libseccomp (for converting the syscall IDs to names). Statically linking is now also possible.
For my demo and to keep things simple, I decided to not fork within the recorder and build a small wrapper around systemd-run: https://github.com/saschagrunert/syscall-recorder/blob/main/hack/oci-hook/hook.go. Right now the recorder is not able to produce a full seccomp profile, but writes a list of syscalls to the target location.

What are your thoughts on that?
I don't understand the proposal and problem statement. Can you elaborate? Is it to rewrite oci-seccomp-bpf-hook in C with libbpf?
I don't understand the proposal and problem statement. Can you elaborate? Is it to rewrite oci-seccomp-bpf-hook in C with libbpf?
The main purpose of the project was to find the dependencies required to build and ship ebpf applications for debugging Kubernetes clusters. The hook was more a side-experiment, because it has a real use case and would also work using libbpf.
My proposal is not to rewrite it completely in C, we could probably split-up the binaries or still use cgo. Using libbpf has benefits from my point of view, for example not relying on the kernel headers.
+1 porting the code to libbpf will make it easier to ship, with less memory footprint and possibly allow us to "Compile Once - Run Everywhere".
Can you outline the exact benefits of using libbpf?
Porting/rewriting is costly and I want to make sure there are sufficient technical benefits.
Heads up: I am generally opposed to creating new projects in C. Build dependencies IMHO are not worth a rewrite of a project. But runtime dependencies may be worth it.
In my opinion, the main benefit is during runtime. We wouldn't have to compile every time we run the tool, reducing the startup time. We would also wouldn't have to rely on kernel headers being present on the target system, which sometimes is a pain to deal with.
There are go libraries that bind to libbpf that can help make the job easier [1].
One con is that the target system needs to support BTF to help remove the kernel headers dependency.
[1] : https://github.com/aquasecurity/libbpfgo
Thanks! Using go-bindings sounds compelling. Avoiding to recompile as well.
I'm on board :+1: Thanks, @saschagrunert & @weirdwiz
Any volunteers?
I'd love to work on it
@saschagrunert are you cool with @weirdwiz taking a shot at it?
One con is that the target system needs to support BTF to help remove the kernel headers dependency. What versions of RHEL support this?
What versions of RHEL support this?
RHEL 8.2+
@saschagrunert are you cool with @weirdwiz taking a shot at it?
Sure, I'm happy to review and support if requested. :+1:
Thanks, @saschagrunert! Happy hacking, @weirdwiz !
Take a look at https://github.com/aquasecurity/btfhub/ to make this work with CO-RE and old kernels (specifically at recent work being done at: https://github.com/aquasecurity/btfhub/tree/main/tools, which will be upstreamed shortly). By doing something like that you're able to generate a binary that will run in any kernel (including old ones) without the dependency of LLVM and runtime compilations. We're pursuing that as well. Hope it helps.
TL;DR: making your eBPF application to support 550 kernels that don't provide BTF files is obtained by adding 1.5MB to an eBPF based application. The recent kernels already provide BTF and you dont have to worry in order to have CO-RE capable eBPF app.
@rafaeldtinoco thank you for the input. I had a look at the btfgen tool and think it looks promising. I'm now wondering, how would a build pipeline look like?
For example:
- We build the bpf object locally and using the vmlinux.h from
bpftool btf dump file /sys/kernel/btf/vmlinux format c - We generate the smaller btf files once via
btfgenusing the btfhub and put them into our repository - Use the
btf_custom_pathoption from libbpf for the relocation (see NewModuleFromBufferArgs in libbpfgo)
This would mean that the custom btf files need to be part of the local file system during execution.
@rafaeldtinoco thank you for the input. I had a look at the
btfgentool and think it looks promising. I'm now wondering, how would a build pipeline look like?For example:
- We build the bpf object locally and using the vmlinux.h from
bpftool btf dump file /sys/kernel/btf/vmlinux format c- We generate the smaller btf files once via
btfgenusing the btfhub and put them into our repository- Use the
btf_custom_pathoption from libbpf for the relocation (see NewModuleFromBufferArgs in libbpfgo)This would mean that the custom btf files need to be part of the local file system during execution.
Our current thoughts are:
- to create a small REST API that uses BTFHUB. You would inform the kernel version and distro and it would provide you the entire BTF file for that kernel (or multiple kernels, depending on the range you provide). You could provide your BPF object to the API and it would generate smaller BTF files for 1 or more kernel versions you provide.
for this case, caching the downloaded files locally would be smart (tracee does that, for example).
- Use btfgen.sh like it is now and include all generated BTF files in your project. This requires that you build all smaller BTFs every time your .bpf.c source files are changed (adding/removing kernel types from it). Something like a github action during release could take care of this (downloading BTFHUB, generating specific BTFs using your compiled object, including the BTF files in the filesystem.
WDYT ?
@rafaeldtinoco thank you for the input, I'm working on a syscall recorder PoC in https://github.com/kubernetes-sigs/security-profiles-operator/pull/618.
I decided to go through the following build steps:
- Build the bpf.o once on the local build system
- Using the object as input to build the smaller btfs from btfhub once and commit them into the repo
- Using
go generateto move the btfs into the binary - Depending on the system where the operator runs find the right incremental btf during runtime, write it to disk and load it via
btf_custom_path. (there seems to be no way to load the btf from memory, aka[]byte)
2. and 3. are steps only necessary when the bpf code changes, we will verify that later on by running them in CI and comparing against the committed code.
The question is now: What if a kernel and architecture is not supported by the recorder? Should we leave btf_custom_path empty, which would mean we fallback to the /sys/kernel/btf/vmlinux. Is this safe?
@rafaeldtinoco thank you for the input, I'm working on a syscall recorder PoC in kubernetes-sigs/security-profiles-operator#618.
Yep, that work brought me here IIRC.
I decided to go through the following build steps:
- Build the bpf.o once on the local build system
- Using the object as input to build the smaller btfs from btfhub once and commit them into the repo
- Using
go generateto move the btfs into the binary- Depending on the system where the operator runs find the right incremental btf during runtime, write it to disk and load it via
btf_custom_path. (there seems to be no way to load the btf from memory, aka[]byte)
2.and3.are steps only necessary when the bpf code changes, we will verify that later on by running them in CI and comparing against the committed code.
This all makes sense to me, and goes in the same direction our tracee project heads to (together with libbpfgo's intent). If you ever think that libbpfgo can help in any way (by, for example glueing BTFs and adding them automatically through 'go generate', or something similar, we can discuss this in that project's discussions page.
The question is now: What if a kernel and architecture is not supported by the recorder? Should we leave
btf_custom_pathempty, which would mean we fallback to the/sys/kernel/btf/vmlinux. Is this safe?
The default should be to always use "/sys/kernel/btf/vmlinux". If it does not exist, then your code should identify the OS and kernel and pick the right BTF file for it.
Check how we do this in libbpfgo, here and here.
We then started this approach making tracee to download the BTF file for the environment it was running. This will become the API I mentioned to you. HTTP REST > GIMME OS X KERNEL Y BTF FILE.
Now, let's suppose you're offline... you should try to use the BTF for the closest kernel version you're running. Let's say you have BTF file for 5.4.0-87 and you're running a 5.4.0-89 kernel that does not have a prepared BTF (embedded into your go binary). Then you can try to load using 5.4.0-84 for example and see if it works. Of course, best option would be to download the missing BTF from the API but that won't be always possible (thus the idea of trying to use latest you have, which, very likely, will fit).
I'm having a little trouble figuring out how would we get the $PARENT_PID 1 from the userspace. If we were using C we could change the value in the bss section 2 in the structure generated in the skeleton. But for libbpfgo, I couldn't find a way to to that.
One way could be adding a uprobe to get the PID from the userspace, but it would be hard to figure out which pid belongs to which container, if a lot of binaries are run at the same time.
I'm having a little trouble figuring out how would we get the
$PARENT_PID1 from the userspace. If we were using C we could change the value in the bss section 2 in the structure generated in the skeleton. But for libbpfgo, I couldn't find a way to to that.
We could use a map for now, but I think setting the rodata should be a feature of libbpfgo
Ref https://github.com/aquasecurity/libbpfgo/issues/2, https://github.com/aquasecurity/libbpfgo/issues/27
Any updates, @weirdwiz? I want to make sure that @saschagrunert's request is not falling from our radar.
OK, let's unblock the issue. @weirdwiz is busy with his internship at Red Hat's storage team. If others want to give it a shot, feel free to self-assign or drop a comment.
In the meanwhile we released a first integration within the security-profiles-operator: https://github.com/kubernetes-sigs/security-profiles-operator/tree/main/internal/pkg/daemon/bpfrecorder
Packaging a generic libbpf-based application seems to be the most tricky part here. On the other side, we probably do not have to support a custom BTF if we focus on Fedora/RHEL packaging in the first place.
@vrothberg I can put it in our Node Observability backlog if you don't mind, because we plan to work on ebpf applications in any case in mid-term.
SGTM, thanks!
@saschagrunert did you find time looking into it?
@vrothberg unfortunately not directly, because I think we should clarify how to package the application before moving forward. I'm working with other teams on solving that issue right now, but I think it will take some time (months).
libbpf still seems not to be supported with all features for all kernels by the way. For example ring buffer maps are not supported by Linux < 5.8. Not sure if that is a problem we can encounter.
@saschagrunert what's your current take on the issue? Shall we leave it open or close it?
Let's close it for now, it does not have the priority that I can work on it in the near future.