SELinux blocks fanotify_mark from a Pod that has super_t type
Environment
I am trying to use Kubescape node agent on a Botterocket node in a EKS cluster.
It failed with the error message:
{"level":"info","ts":"2023-12-17T08:19:09Z","msg":"credentials loaded","accountLength":36}
W1217 08:19:09.532776 611945 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. T
time="2023-12-17T08:19:10Z" level=warning msg="container-hook: failed to fanotify mark: fanotify: mark error, permission denied"
time="2023-12-17T08:19:10Z" level=warning msg="container-hook: failed to fanotify mark: fanotify: mark error, permission denied"
{"level":"fatal","ts":"2023-12-17T08:19:10Z","msg":"error starting the container watcher","error":"setting up container collection:
I have checked the audit logs on the node and it is clear that SELinux blocks fanotify_mark:
MESSAGE=audit: type=1400 audit(1702795199.265:1601): avc: denied { watch_with_perm } for pid=492743 comm="node-agent" path="/host/x86_64-bottlerocket-linux-gnu/sys-root/usr/bin/runc" dev="dm-0" ino=190 scontext=system_u:system_r:control_t:s0-s0:c0.c1023 tcontext=system_u:object_r:os_t:s0 tclass=file permissive=0
I have changed the securityContext of the daemonset to include:
securityContext:
capabilities:
add:
- SYS_ADMIN
- SYS_PTRACE
- NET_ADMIN
- SYSLOG
- SYS_RESOURCE
- IPC_LOCK
- NET_RAW
privileged: true
runAsUser: 0
seLinuxOptions:
type: super_t
to make sure this system-call is not denied by the SELinux engine.
What I expected to happen:
I expected this to work, according to the documentation type: super_t should bypass any restrictions.
What actually happened:
It is still blocked with the same audit message as above.
How to reproduce the problem:
Install Kubescape operator as explained here on a Bottlerocket backed Kubernetes cluster.
privileged: true currently has the effect of silently overriding any custom SELinux label. The workaround is to not set it, but to enable all capabilities, disable seccomp, etc, to gain most of the effects of privileged: true without actually setting it.
Thanks @bcressey , it works indeed when removing privilged: true.
We have tried both control_t and super_t, does it make sense that only the later works?
@slashben these actions are currently restricted to super_t, which is meant to be a deliberate opt-in to system calls that can break host functionality in surprising ways.
In this case it's that a process using restrictive fanotify access logic on host mounts can block the API from working, block the flow of container metrics, prevent updates from being applied, etc.
SELinux relabeling is also restricted to super_t for essentially the same reason.
"Can it prevent updates, brick the node on update, or cause data loss after an update?" is the key question. Of course most uses of fanotify (and most uses of SELinux relabeling) are not going to have this effect, but some can.
The opt-in is partly to acknowledge the risk and partly so we know when troubleshooting something that the usual rules may not apply.