kind icon indicating copy to clipboard operation
kind copied to clipboard

Can't run rootless kind (podman) without systemd?

Open mdkcore0 opened this issue 8 months ago • 11 comments

Hi, I have kind running with docket on a systemd-less system for some time, working as expected. As I got a new laptop, would like to try running it with podman, but I'm getting the following error:

ERROR: failed to create cluster: running kind with rootless provider requires setting systemd property "Delegate=yes", see https://kind.sigs.k8s.io/docs/user/rootless/

I saw about the Delegate=yes think, but in both machines I run void-linux; will install docker on the new one for now, but can I run it with podman with no systemd??

Thanks

mdkcore0 avatar Mar 27 '25 18:03 mdkcore0

That error message assumes systemd, but kind is not talking to systemd, this happens when it doesn't detect support for key cgroup controllers

https://github.com/kubernetes-sigs/kind/blob/3f8d6dd0853625f38ea3391383833167e9bacccd/pkg/cluster/internal/create/create.go#L252-L253

BenTheElder avatar Mar 27 '25 20:03 BenTheElder

Thanks! It does not shows this error anymore, but now I'm getting this one, despite I've cgroups2 enabled here:

ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

mdkcore0 avatar Mar 31 '25 21:03 mdkcore0

That error is saying it timed out waiting for a log line from the node indicating that the unit ran successfully, there are two ways it can be ready to start exec.

If you run with --retain it won't cleanup on failure and we can look at the node logs. (kind export logs)

BenTheElder avatar Apr 01 '25 05:04 BenTheElder

I've just encountered the "could not find a log line" thing, podman logs look like:

Welcome to Debian GNU/Linux 12 (bookworm)!

Failed to create /init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Which looks like https://github.com/moby/moby/issues/42275

Podman supports mounting cgroups read/write with https://github.com/containers/podman/pull/9536

so looks like --security-opt unmask=/sys/fs/cgroup should be added to the arguments at least when not doing --privileged.

However on my system (running Chimera Linux) this is happening even with --privileged :/

valpackett avatar Apr 04 '25 06:04 valpackett

We are doing --privileged.

Kubernetes development (as in the core Kubernetes repo) requires docker with buildkit. So the maintainers of this project are not active podman users.

This is also different depending on v1/v2 and cgroupns

The preferred scenario going forward is cgroup v2 + cgroupns=private. We should not be writing directly to the actual host root.

BenTheElder avatar Apr 04 '25 06:04 BenTheElder

Right. I guess cgroupns might require the host system to use cgroups (and in a particular way?) so it might require extra steps on non-systemd systems? 🤔

valpackett avatar Apr 04 '25 06:04 valpackett

Possibly, the intersection of podman but not systemd is rare IME.

In general the ecosystem is largely tested with systemd on distros like Debian, Ubuntu, RHEL, ..., kubernetes/kubernetes, containerd, and runc are all testing with systemd only AFAIK. I'm not sure about podman and crun but I'd be a little surprised if the Redhat ecosystem CI wasn't all systemd.

Makes it a bit difficult to support on our own. We don't aim to require it but we also aren't heavily invested in non-systemd support.

I thought this sort of thing was fixed, it's been a long time since the last report, I added some workarounds for non-systemd hosts a few years ago ...

BenTheElder avatar Apr 04 '25 06:04 BenTheElder

Rootless is also still largely community supported, we still need rootful to fully test Kubernetes last I checked.

BenTheElder avatar Apr 04 '25 06:04 BenTheElder

ha! So what it needs to make write access possible to cgroups inside of the namespace is running in a user-writable cgroup, which in the systemd world is managed by slices.

With cg* tools (libcgroup-progs here), I can create and enter a cgroup:

doas cgcreate -a $USER -t $USER -g memory,cpu,cpuset,pids,misc:owowo
cgexec -g memory,cpu,cpuset,misc:owowo kind create cluster

Exceeept… apparently processes need to be moved between groups (at least for cgexec) so we need to be able to write cgroup.procs on the common ancestor https://unix.stackexchange.com/questions/725112/using-cgroups-v2-without-root

so I just did the "YOLO" thing, naturally:

doas chown $USER:$USER /sys/fs/cgroup/cgroup.procs

which made cgexec work, and kind under that cgexec proceeded further…

But inside of the container, only containerd was running, not kubelet, because

root@kind-control-plane:/# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
…
    Process: 366 ExecStartPre=/bin/sh -euc if [ -f /sys/fs/cgroup/cgroup.controllers ]; then /kind/bin/create-kubelet-cgroup-v2.sh; fi (code=exited, status=1/FAILURE)

root@kind-control-plane:/# /kind/bin/create-kubelet-cgroup-v2.sh
ERROR: this script needs /sys/fs/cgroup/cgroup.procs to be empty (for writing the top-level cgroup.subtree_control)

root@kind-control-plane:/# cat /sys/fs/cgroup/cgroup.procs
0
0
0
0
0
0
196

and the boot log had this very early on:

Couldn't move remaining userspace processes, ignoring: Input/output error

so some other permission is missing…

valpackett avatar Apr 04 '25 07:04 valpackett

IT WORKS \o/

Basically,

doas cgcreate -a $USER -t $USER -g cpuset,cpu,io,memory,pids:/kube
echo '+cpuset +cpu +io +memory +pids' > /sys/fs/cgroup/kube/cgroup.subtree_control
doas chown $USER:$USER /sys/fs/cgroup/cgroup.procs # BAD INSECURE LOL

+

--- i/pkg/cluster/internal/providers/podman/provision.go
+++ w/pkg/cluster/internal/providers/podman/provision.go
@@ -136,6 +136,7 @@ func commonArgs(cfg *config.Cluster, networkName string, nodeNames []string) ([]
                "--label", fmt.Sprintf("%s=%s", clusterLabelKey, cfg.Name),
                // specify container implementation to systemd
                "-e", "container=podman",
+               "--cgroup-parent", "/kube",
                // this is the default in cgroupsv2 but not in v1
                "--cgroupns=private",
        }

Would be nice to add a PODMAN_CGROUP_PARENT env var for this and/or a generic PODMAN_EXTRA_ARGS (?)

valpackett avatar Apr 04 '25 08:04 valpackett

We're definitely not adding PODMAN_EXTRA_ARGS, that's a support nightmare.

I'm not sure if we wan't to add overriding this either, it's running, but I don't know that this is how it should be running and that looks like a footgun.

We generally want to just do the right thing and not expose more user-overrides for the node creation that constrain us from doing the right thing in the future (because users now depend on the specific values they're overriding instead of treating node bringup as an implementation detail, which may or may not be implemented via exec).

BenTheElder avatar Apr 04 '25 08:04 BenTheElder