spire icon indicating copy to clipboard operation
spire copied to clipboard

Agent K8S workload attestor support for cgroups v2

Open tihhoni opened this issue 2 years ago • 12 comments

  • Version: 1.5.5+
  • Platform: K8S with amd64 linux kernel using cgroups v2
  • Subsystem: Agent k8s workload attestor

Currently k8s workload attestor is extracting podUID and containerUID from /proc/<...pid...>/cgroup https://github.com/spiffe/spire/blob/main/pkg/agent/plugin/workloadattestor/k8s/k8s.go#L206 https://github.com/spiffe/spire/blob/main/pkg/agent/common/cgroups/cgroups.go#L31

This works for cgroups v1, but it does not work with cgroup v2 For v2 /proc/<...pid...>/cgroup will always be: 0::/ Which results in no containerUID extracted and thus no k8s workload attestation possible

Are you planning to migrate it to support cgroups v2? e.g. looking at some solutions it seems its possible to extract containerUID from /proc/self/mountinfo

Background:

  • https://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_1
  • https://man7.org/linux/man-pages/man7/cgroups.7.html#CGROUPS_VERSION_2
  • https://docs.kernel.org/admin-guide/cgroup-v2.html
  • https://stackoverflow.com/questions/68816329/how-to-get-docker-container-id-from-within-the-container-with-cgroup-v2

tihhoni avatar Mar 21 '23 15:03 tihhoni

Thank you for opening this @tihhoni , and for sharing the links - they're very helpful

I see that kubelet will use cgroups v2 by default any time that it is enabled. Do you have a sense for which or how many distributions are shipping with cgroups v2 enabled currently? Did you bump into this issue directly?

evan2645 avatar Mar 21 '23 19:03 evan2645

Here's some info that I was able to find regarding which distros are already being shipped with groups v2 enabled by default:

  • Container Optimized OS (since M97)
  • Ubuntu (since 21.10)
  • Debian GNU/Linux (since Debian 11 Bullseye)
  • Fedora (since 31)
  • Arch Linux (since April 2021)
  • RHEL and RHEL-like distributions (since 9)

Ref: https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/#how-do-you-use-cgroup-v2

guilhermocc avatar Mar 22 '23 19:03 guilhermocc

Hi @evan2645

From various sources i've got the feeling that cgroup v1 will be deprecated EOY 2023. So presumably new kernel OS updates will come exclusively with cgroup v2 support.

We've run into this issue in staging cluster, where OS upgrade automatically rolled out new linux distribution and tests started to fail.

I am still running tests to understand what impact that will cause. What i know so far:

  • spire agent run PID does not have any container/pod details in /proc/<...pid...>/cgroup
  • this makes spire agent unaware that it runs in k8s, so attempt to use CLI e.g. ./bin/spire-agent api fetch will always result in "No identity issued" with empty selectors
  • surprisingly when i use a "client" from tutorial that does "watch" svids that one has container/pod UID in /proc/<...pid...>/cgroup https://github.com/spiffe/spire-tutorials/blob/main/k8s/quickstart/client-deployment.yaml

So it seems that PID cgroup might have container/pod uid info in some cases, i just did not figure out what that condition is.

If i get more results i will surely post it here.

-- Nikolai

tihhoni avatar Mar 22 '23 19:03 tihhoni

After some testing it seems that issue is not that dire:

spire-agent run PID = 302967 busybox sleep PID = 329596

Both run pods

hostPID: true
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
k exec -it busybox -- /bin/sh
/ # cat /proc/302967/cgroup
0::/../../kubepods-besteffort-pod02b5aa9b_9013_49b5_9755_efeb06552752.slice/cri-containerd-222a5371beed8c0ad58e23d1cd39e13bc113915a385d3fea4510e1cd404868c2.scope
/ # cat /proc/329596/cgroup
0::/
k exec -it ds/spire-agent -c spire-agent -- /bin/sh
/opt/spire # cat /proc/302967/cgroup
0::/
/opt/spire # cat /proc/329596/cgroup
0::/../../kubepods-besteffort-podb2dce23b_8b3a_4356_b903_56152d488916.slice/cri-containerd-cf6d9de603ab1f665fc84f88c22f761678fcaae86b98a3936e0b9717740146aa.scope

So from within the container if you try to read your own cgroup - its root without container id. But if you read other process cgroup its fine.

As a result spire agent can attest other k8s pids but not itself.

This might be different depending on the cluster setup.

tihhoni avatar Mar 23 '23 09:03 tihhoni

Thank you very much @tihhoni for looking into this further ❤️

Can you tell if the problem only affects "self", or if it affects e.g. all pods running as root? I'm marking this as priority/backlog for now, but if it affects all root pods, we may consider upgrading it to urgent. Thanks again!

evan2645 avatar Mar 28 '23 14:03 evan2645

We've noticed similar changes will be needed for the dockerd-workloadattestor to support cgroups v2. We have a project that will require cgroups v2 support in dockerd-workloadattestor beginning in May 2023. After a brief discussion in the contributor sync today, we reached a consensus that it makes sense to introduce logic to add this support across several PRs:

  • introduce logic to support cgroups v2
  • migrate k8s workload attestor to use the cgroups v2 logic
  • migrate dockerd workload attestor to use the cgroups v2 logic

zmt avatar Mar 28 '23 19:03 zmt

@evan2645 my impression was only "self" is the problem. I tested full ingress/egress envoy setup and attestation works fine (spire server can successfully get envoy pids container ids)

Forgot to mention: we run things in containerd 1.6.* and k8s 1.25.* i did not check other runtimes

tihhoni avatar Mar 31 '23 09:03 tihhoni

migrate k8s workload attestor to use the cgroups v2 logic

v1 will also need to be supported for some time. This is probably what you meant, but want to make sure.

kfox1111 avatar Apr 11 '23 16:04 kfox1111

I am again running into this issue, but now from different perspective. just wanted to share (more related to cgroupsns but in all cases running on cgroups v2)

Seems that tools are slowly moving away from cgroup v1 and also using cgroupns=private (e.g. last docker daemon and kind releases) https://docs.docker.com/engine/reference/commandline/dockerd/#daemon https://github.com/kubernetes-sigs/kind/releases/tag/v0.20.0

this creates a problem for running some integration tests where you have cgroupns hardcoded https://github.com/spiffe/spire/blob/main/test/integration/suites/nested-rotation/00-setup#L32

if we run tests with dind in k8s (cgroupns=private) docker cgroups that agent workload attester sees looks like /../<id> (--cgroup-parent=/actions_job on docker deamon does nothing here)

if we run tests with dind cgroupsns=host + --cgroup-parent=/actions_job then they work fine but then we face issues with kind cluster tests that mess up main k8s kubelet cgroup

we can patch CGROUP_MATCHERS for us for tests of course (maybe its good to allow users to set it too, default matcher /docker/<id> does not work in this case too)

just a heads up for now. if your GA VM will switch docker daemon to cgroupns=private on cgroups v2

nikotih avatar Jan 18 '24 12:01 nikotih

@nikotih thank you very much for reporting this ❤️ would you mind filing it as a new issue specifically around cgroupns=private? I think that warrants separate attention and it will be good to triage it independently (but still in relation to this one)

evan2645 avatar Jan 18 '24 16:01 evan2645

Sure thing i can. However, i think cgroups v2 and cgroups namespace private go hand in hand.

cgroups v2 enabled kubelet and docker daemon will automatically start with namespace private (unless explicitly changed)

original issue that i opened here, actual root cause is also namespace private:

docker info
Client:
 Version:    24.0.7
 Context:    desktop-linux
...
Server:
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
...

Discalimer: Non systemd environment

docker run --rm alpine cat /proc/1/cgroup
0::/

docker run --rm --cgroupns=private alpine cat /proc/1/cgroup
0::/

docker run --rm --cgroupns=host alpine cat /proc/1/cgroup
0::/docker/46f093fa43676f4cfda0fe1fbf49a6c959f56e22808e2592e1d1e7bd0366ed0c

aka for spire agent on cgroups v2 and namespace private, it cannot deduce its own container

starting 2 containers with

docker run -it --rm --cgroupns=private --pid=host alpine /bin/sh

PIDS:

2809 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3c217b7f3a40352850d757d3f5f0e263731425b0350
2835 root      0:00 /bin/sh

2977 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 67b44b40c304559915db0b688f60e83555c40f26e1d
3003 root      0:00 /bin/sh

First image:

/ # cat /proc/2835/cgroup
0::/
/ # cat /proc/3003/cgroup
0::/../67b44b40c304559915db0b688f60e83555c40f26e1d4712f0b7dc91dafe3ddd4

Second image:

/ # cat /proc/2835/cgroup
0::/../3c217b7f3a40352850d757d3f5f0e263731425b03505dd73420d9825ec76d9b2
/ # cat /proc/3003/cgroup
0::/

Same thing:

  • its own container it sees as root
  • other containers it sees not as /docker/<id> but as /../<id>

nikotih avatar Jan 19 '24 06:01 nikotih

Ok, I appreciate this @nikotih . I don't have the time to dig in right now but I'll move this issue back to triage so we can reassess the impact.

evan2645 avatar Jan 19 '24 06:01 evan2645