kind
kind copied to clipboard
Can't run rootless kind (podman) without systemd?
Hi, I have kind running with docket on a systemd-less system for some time, working as expected. As I got a new laptop, would like to try running it with podman, but I'm getting the following error:
ERROR: failed to create cluster: running kind with rootless provider requires setting systemd property "Delegate=yes", see https://kind.sigs.k8s.io/docs/user/rootless/
I saw about the Delegate=yes think, but in both machines I run void-linux; will install docker on the new one for now, but can I run it with podman with no systemd??
Thanks
That error message assumes systemd, but kind is not talking to systemd, this happens when it doesn't detect support for key cgroup controllers
https://github.com/kubernetes-sigs/kind/blob/3f8d6dd0853625f38ea3391383833167e9bacccd/pkg/cluster/internal/create/create.go#L252-L253
Thanks! It does not shows this error anymore, but now I'm getting this one, despite I've cgroups2 enabled here:
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
That error is saying it timed out waiting for a log line from the node indicating that the unit ran successfully, there are two ways it can be ready to start exec.
If you run with --retain it won't cleanup on failure and we can look at the node logs. (kind export logs)
I've just encountered the "could not find a log line" thing, podman logs look like:
Welcome to Debian GNU/Linux 12 (bookworm)!
Failed to create /init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
Which looks like https://github.com/moby/moby/issues/42275
Podman supports mounting cgroups read/write with https://github.com/containers/podman/pull/9536
so looks like --security-opt unmask=/sys/fs/cgroup should be added to the arguments at least when not doing --privileged.
However on my system (running Chimera Linux) this is happening even with --privileged :/
We are doing --privileged.
Kubernetes development (as in the core Kubernetes repo) requires docker with buildkit. So the maintainers of this project are not active podman users.
This is also different depending on v1/v2 and cgroupns
The preferred scenario going forward is cgroup v2 + cgroupns=private. We should not be writing directly to the actual host root.
Right. I guess cgroupns might require the host system to use cgroups (and in a particular way?) so it might require extra steps on non-systemd systems? 🤔
Possibly, the intersection of podman but not systemd is rare IME.
In general the ecosystem is largely tested with systemd on distros like Debian, Ubuntu, RHEL, ..., kubernetes/kubernetes, containerd, and runc are all testing with systemd only AFAIK. I'm not sure about podman and crun but I'd be a little surprised if the Redhat ecosystem CI wasn't all systemd.
Makes it a bit difficult to support on our own. We don't aim to require it but we also aren't heavily invested in non-systemd support.
I thought this sort of thing was fixed, it's been a long time since the last report, I added some workarounds for non-systemd hosts a few years ago ...
Rootless is also still largely community supported, we still need rootful to fully test Kubernetes last I checked.
ha! So what it needs to make write access possible to cgroups inside of the namespace is running in a user-writable cgroup, which in the systemd world is managed by slices.
With cg* tools (libcgroup-progs here), I can create and enter a cgroup:
doas cgcreate -a $USER -t $USER -g memory,cpu,cpuset,pids,misc:owowo
cgexec -g memory,cpu,cpuset,misc:owowo kind create cluster
Exceeept… apparently processes need to be moved between groups (at least for cgexec) so we need to be able to write cgroup.procs on the common ancestor https://unix.stackexchange.com/questions/725112/using-cgroups-v2-without-root
so I just did the "YOLO" thing, naturally:
doas chown $USER:$USER /sys/fs/cgroup/cgroup.procs
which made cgexec work, and kind under that cgexec proceeded further…
But inside of the container, only containerd was running, not kubelet, because
root@kind-control-plane:/# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
…
Process: 366 ExecStartPre=/bin/sh -euc if [ -f /sys/fs/cgroup/cgroup.controllers ]; then /kind/bin/create-kubelet-cgroup-v2.sh; fi (code=exited, status=1/FAILURE)
root@kind-control-plane:/# /kind/bin/create-kubelet-cgroup-v2.sh
ERROR: this script needs /sys/fs/cgroup/cgroup.procs to be empty (for writing the top-level cgroup.subtree_control)
root@kind-control-plane:/# cat /sys/fs/cgroup/cgroup.procs
0
0
0
0
0
0
196
and the boot log had this very early on:
Couldn't move remaining userspace processes, ignoring: Input/output error
so some other permission is missing…
IT WORKS \o/
Basically,
doas cgcreate -a $USER -t $USER -g cpuset,cpu,io,memory,pids:/kube
echo '+cpuset +cpu +io +memory +pids' > /sys/fs/cgroup/kube/cgroup.subtree_control
doas chown $USER:$USER /sys/fs/cgroup/cgroup.procs # BAD INSECURE LOL
+
--- i/pkg/cluster/internal/providers/podman/provision.go
+++ w/pkg/cluster/internal/providers/podman/provision.go
@@ -136,6 +136,7 @@ func commonArgs(cfg *config.Cluster, networkName string, nodeNames []string) ([]
"--label", fmt.Sprintf("%s=%s", clusterLabelKey, cfg.Name),
// specify container implementation to systemd
"-e", "container=podman",
+ "--cgroup-parent", "/kube",
// this is the default in cgroupsv2 but not in v1
"--cgroupns=private",
}
Would be nice to add a PODMAN_CGROUP_PARENT env var for this and/or a generic PODMAN_EXTRA_ARGS (?)
We're definitely not adding PODMAN_EXTRA_ARGS, that's a support nightmare.
I'm not sure if we wan't to add overriding this either, it's running, but I don't know that this is how it should be running and that looks like a footgun.
We generally want to just do the right thing and not expose more user-overrides for the node creation that constrain us from doing the right thing in the future (because users now depend on the specific values they're overriding instead of treating node bringup as an implementation detail, which may or may not be implemented via exec).