benchexec icon indicating copy to clipboard operation
benchexec copied to clipboard

Does eBPF work inside container-mode?

Open charmoniumQ opened this issue 1 year ago • 2 comments

Use case: I wanted to benchmark an application in a normal system and one with eBPF filter on kernel tracepoints. Is this possible in container-mode?

I wrote an eBPF/bpftrace program which works as a normal user through setuid magic outside the container, but it gives the following error if I run it with containerexec:

ERROR: tracepoint not found: syscalls:sys_enter_fork

I think that is actually a permission error. If bpftrace doesn't have the root ruid and euid, /sys/kernel/tracing will not show any tracepoints. Fakeroot doesn't cut it.

I'm by no means an expert in Linux namespaces, I think we would want to add an opt-in flag to benchexec that adds a mapping from root (uid=0) outside the container to root (uid=0) inside the container to /proc/$benchexec/uid_map. I can implement it on my own, but I wanted to hear if I am on the right path from someone who understands namespaces better.

charmoniumQ avatar Feb 12 '24 08:02 charmoniumQ

I don't know about eBPF. But if it requires full root, i.e., the same as being uid 0 outside the container, then it will not work.

If it requires only root inside the container (or some capability like CAP_SYS_ADMIN, then it may work with containerexec --root. If it is supposed to work inside containers but does not work even with containerexec --root, then we could investigate what it actually needs and what is preventing it from working.

If you know that it requires full root, giving root inside the container access to the full root outside the container using uid_map would technically work, but opens up problems.

Using uid_map would require to execute BenchExec as root. But it was written with the intention of running as a regular user, and in particular the containerization used by BenchExec assumes that. I do not know whether running BenchExec as root would keep its isolation promises or whether it would open up security holes.

Giving full root access to inside the container would of course completely eliminate any isolation promises.

So I am hesitant to consider this.

Are there no other solutions for you? For example, setup tracing outside the container and then run BenchExec?

PhilippWendler avatar Feb 12 '24 14:02 PhilippWendler

So I am hesitant to consider this.

Understood.

For example, setup tracing outside the container and then run BenchExec?

Yeah, I would just need to know the PID of the grandchild in the outside-of-BenchExec namespace (the PID inside BenchExec's namespace is always 2). I think I could change parent_setup_fn to take a kwarg specifying that pid. I will change ContainerExecutor and BaseExecutor to both pass a pid to parent_setup_fn, for consistencies sake. As in ContainerExecutor's case, BaseExecutor should wait for a byte signalling that the parent_setup_fn is complete before launching the tool. The pid will be passed to parent_setup_fn as a kwarg, so existing code may have to change a little, but they would be more future-proof if they soak up and ignore extra **kwargs.

What do you think of that?

charmoniumQ avatar Feb 12 '24 17:02 charmoniumQ