talos
talos copied to clipboard
feat: mount /sys/kernel/security into kubelet
This allows the kubelet to enforce AppArmor.
Since AppArmor landed with 1.7 kernel it probably makes sense to backport
Since AppArmor landed with 1.7 kernel it probably makes sense to backport
I think getting this in to v1.7 fits nicely with the bump to Kubernetes 1.30 that includes the move of AppArmor config to Pod SecurityContext fields. https://github.com/kubernetes/kubernetes/pull/123435
I tried to validate if this fixes the problem with AppArmor configured on Pod causing it to get stuck in Pending
with message Cannot enforce AppArmor: AppArmor is not enabled on the host
.
I mounted the securityfs
in kubelet
using machine config:
machine:
kubelet:
extraMounts:
- destination: /sys/kernel/security
type: bind
source: /sys/kernel/security
options:
- bind
- ro
Starting a Pod with AppArmor profile specified (that I have loaded into the node Kernel beforehand):
spec:
containers:
- name: test
...
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: myprofile
I no longer get the message ...AppArmor is not enabled on the host
error but a new type of error: Error: failed to get container spec opts: failed to generate apparmor spec opts: apparmor is not supported
Feels like there is something else that also needs to be fixed around the kublet
.
@konrader it might be easier if you could provide us with a full set of steps to launch a pod with AppArmor profile, which we could include into Talos as an integration test. This way we can develop a fix and a test.
I will write down the steps needed. It does include loading a AppArmor profile (with apparmor_parser
cmdline tool) via a (somewhat) privileged container.
Where are the integration tests located today?
I will write down the steps needed. It does include loading a AppArmor profile (with
apparmor_parser
cmdline tool) via a (somewhat) privileged container.Where are the integration tests located today?
in the internal/integration
, but if that only needs some Kubernetes interactions, this is perfectly fine as well
Here is a complete script for doing the test of both loading AppArmor profile into kernel and running a Pod with profile enabled.
#!/bin/sh
NS=apparmor
kubectl create namespace $NS
kubectl label ns $NS pod-security.kubernetes.io/enforce=privileged
kubectl apply -n $NS -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: apparmor
spec:
terminationGracePeriodSeconds: 3
containers:
- name: apparmor
image: ghcr.io/bifrostsec/apparmor-image/apparmor:22.04
# Just spin & wait forever
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
securityContext:
privileged: true
capabilities:
add:
- MAC_ADMIN
volumeMounts:
- name: sysfs
mountPath: /host/sys
readOnly: false
- name: securityfs
mountPath: /sys/kernel/security
readOnly: false
volumes:
- name: sysfs
hostPath:
path: /sys
- name: securityfs
hostPath:
path: /sys/kernel/security
EOF
kubectl -n $NS wait --for=condition=Ready pod/apparmor
# Copy over simple AppArmor audit profile
kubectl -n $NS cp audit.profile apparmor:tmp/audit.profile
# Load it into host Kernel
kubectl -n $NS exec apparmor -it -- apparmor_parser /tmp/audit.profile
kubectl apply -n $NS -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
terminationGracePeriodSeconds: 3
containers:
- name: test
image: alpine:3.19.1
# Just spin & wait forever
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: audit-all
EOF
This depends on AppArmor profile being in a separate file named audit.profile
with this content:
abi <abi/3.0>,
profile audit-all flags=(attach_disconnected,mediate_deleted,complain,audit) {
}
I've located the error failed to generate apparmor spec opts: apparmor is not supported
to be coming from containerd
https://github.com/containerd/containerd/blob/release/1.7/pkg/cri/server/container_create_linux.go#L505
which at the core relies on containerd
's own check apparmor.HostSupports()
to see if host supports AppArmor.
https://github.com/containerd/containerd/blob/release/1.7/pkg/apparmor/apparmor_linux.go#L34
This check does more than kubelet
already does, notably it also checks for /sbin/apparmor_parser
being present which it's not on a Talos node.
On Talos (or other Linux platforms) where AppArmor profile loading is done either from other user-land tool or from inside container this check is not necessary since containerd requires the profile to be loaded into kernel before you reference it for a container anyway. There seems to be a bit of controversy around this with removing extra check and then reverting it back: https://github.com/containerd/containerd/pull/8087
This PR is still a step in the right direction since it gets rid of the problem with kubelet
not detecting if AppArmor is enabled on the node.
@konrader we can put an empty /sbin/apparmor_parser
to the rootfs if that helps, but I don't fully understand how it broke Moby? would it break some workloads on Talos (not using AppArmor previously)?
I was actually thinking about adding empty /sbin/apparmor_parser to Talos rootfs to try but didn't have my build-instance up. Not sure either what the connection is between containerd and moby in regards to AppArmor. Moby says on their website site that they use containerd as the default container runtime, but still it's weird that they don't accept referencing AppArmor profiles (that is already loaded into kernel) for containers unless apparmor_parser is present (indicating that userland AppArmor tools are installed). It's a bit like checking if your car repair shop exists before you can start your car.
I tried to add empty /sbin/apparmor_parser
via machine config but that wasn't allowed.
machine:
files:
- content: ''
permissions: 0o555
path: /sbin/apparmor_parser
op: create
Got this message during boot:
create operation not allowed outside of /var: "/sbin/apparmor_parser"
What is the easiest way to add the file in rootfs to try out if this would work?
What is the easiest way to add the file in rootfs to try out if this would work?
The only way is to rebuild Talos with changes to the Dockerfile
to touch that file in the rootfs
.
I can get to it with the steps you provided to build a test for AppArmor.
@konrader we can put an empty
/sbin/apparmor_parser
to the rootfs if that helps, but I don't fully understand how it broke Moby? would it break some workloads on Talos (not using AppArmor previously)?
I'm looking into why Moby/Docker want containerd
to have the extra check for /sbin/apparmor_parser
.
It seems to be related to Moby applying a default AppArmor profile docker-default
to started containers (unless overridden by security-opt
option) https://docs.docker.com/engine/security/apparmor/#understand-the-policies
Moby removed their own AppArmor enabled check and started relying on containerd's check: https://github.com/moby/moby/pull/42276
Moby uses the result of this check to see if they need to parse and load docker-default
profile, which they need apparmor_parser
for. https://github.com/moby/moby/blob/master/profiles/apparmor/apparmor.go#L59
So some of their users got problem starting containers when AppArmor was enabled in Kernel but AppArmor userland tools weren't installed.
I do see that the default profile loading stuff is put under /contrib in containerd and it could possible be a problem with missing apparmor_parser
if someone deploys a workload on Talos specifying it to apply the runtime's default profile, like this:
securityContext:
appArmorProfile:
type: RuntimeDefault
Docs: https://kubernetes.io/docs/tutorials/security/apparmor/#securing-a-pod
I will try to verify this.
I could reproduce the scenario with Kubernetes 1.30 running on a Ubuntu 22.04 when I uninstalled AppArmor userland tools (apt remove apparmor
and reboot since containerd caches the AppArmor enabled check result).
With no /sbin/apparmor_parser
then I could not start Pod with AppArmor securityContext, got error message failed to generate apparmor spec opts: apparmor is not supported
(same as on Talos after mounting /sys/kernel/security
into kubelet).
I created an empty /sbin/apparmor_parser
and when rebooting then various Kubernetes pods fails to start with error:
kubelet[1706]: E0430 16:35:26.865258 1706 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CreateContainerError: \"failed to create containerd container: load apparmor profile /tmp/cri-containerd.apparmor.d3636700873: parser error(\\\"\\\"): exec: \\\"apparmor_parser\\\": executable file not found in $PATH\""
So it seem like the CRI implementation in containerd do try to apply some default AppArmor profile for all containers (that does not have specific AppArmor profile specified in security option).
If we do want AppArmor support in Talos then I see the following options:
- Include real
apparmor_parser
in rootfs - Disable applying default profile to containers in containerd CRI
It does not seem possible to disable automatic applying of a default AppArmor profile (called cri-containerd.apparmor.d
) in containerd
for containers started via CRI (the interface Kubernetes uses) and it has deemed AppArmor enabled on host.
I assume this is to make all container runtimes have the same default behaviour: https://github.com/containerd/containerd/blob/main/internal/cri/server/container_create_linux.go#L202
If we would include apparmor_parser
(1.5M binary) into the rootfs for everyone then we have the impact of people suddenly having their workloads running with a default profile without knowing it.
The default profile is quite open but it do enforce denying access to certain things that people might not prepared for, see profile template here: https://github.com/containerd/containerd/blob/main/contrib/apparmor/template.go#L42
@konrader thanks for your research here. I think having AppArmor (better security) is in the best spirit of Talos. We can think whether we want apparmor-parser as an extension for example, but I guess this becomes a bigger change, so would likely be in the timeframe of 1.8
I created issue #8715 to track the actual AppArmor work, while this PR makes sense on its own, so should be good to merge
/m