k3d icon indicating copy to clipboard operation
k3d copied to clipboard

[BUG] rootless docker -> k3d blocks forever (k3s boot loops)

Open shoffmeister opened this issue 3 years ago • 6 comments

What did you do

Baseline:

  • Fedora 33
  • cgroups v2
  • provision Docker CE from the official Docker repo for rootless docker
  • install rootless docker following https://docs.docker.com/engine/security/rootless/ and make sure to convince Fedora to use fuse-overlayfs via echo '{"storage-driver": "fuse-overlayfs"}' > ~/.config/docker/daemon.json

k3d:

  • export USE_SUDO=false
  • export K3D_INSTALL_DIR=~/bin (~/bin exists and is on the PATH)
  • wget -q -O - https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash (that's copy&paste)
  • How was the cluster created?
    • k3d cluster create mycluster (that's copy&paste)

Problem: Command hangs after having emitted

INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-mycluster' (4f944e1b21bff3718107f3843216e9a69288b3579dce77377732a1417e82370f) 
INFO[0000] Created volume 'k3d-mycluster-images'        
INFO[0001] Creating node 'k3d-mycluster-server-0'       
INFO[0001] Creating LoadBalancer 'k3d-mycluster-serverlb' 
INFO[0001] Starting cluster 'mycluster'                 
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-mycluster-server-0'   

After considerable time, it starts spewing

WARN[0204] Node 'k3d-mycluster-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 

which is somewhat understandable given that docker logs k3d-mycluster-server-0 is unhappy with

I0501 17:24:44.193897       7 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0501 17:24:44.193931       7 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
time="2021-05-01T17:24:44.209114066Z" level=info msg="Running kube-scheduler --address=127.0.0.1 --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --leader-elect=false --port=10251 --profiling=false --secure-port=0"
time="2021-05-01T17:24:44.209273499Z" level=info msg="Waiting for API server to become available"
time="2021-05-01T17:24:44.209489318Z" level=info msg="Running kube-controller-manager --address=127.0.0.1 --allocate-node-cidrs=true --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --cluster-signing-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --configure-cloud-routes=false --controllers=*,-service,-route,-cloud-node-lifecycle --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --leader-elect=false --port=10252 --profiling=false --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=0 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.key --use-service-account-credentials=true"
time="2021-05-01T17:24:44.211128448Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
time="2021-05-01T17:24:44.211182001Z" level=info msg="To join node to cluster: k3s agent -s https://172.22.0.2:6443 -t ${NODE_TOKEN}"
time="2021-05-01T17:24:44.214298925Z" level=info msg="Wrote kubeconfig /output/kubeconfig.yaml"
time="2021-05-01T17:24:44.215290745Z" level=info msg="Run: k3s kubectl"
time="2021-05-01T17:24:44.215494947Z" level=fatal msg="failed to find cpu cgroup (v2)"

Note: I have not tried running k3s without the k3d wrapper (yet) - i.e. neither under root nor rootless.

shoffmeister avatar May 01 '21 17:05 shoffmeister

From https://github.com/k3s-io/k3s/issues?q=is%3Aissue+is%3Aopen++rootless I cannot tell whether this is a k3s challenge or whether k3d driving k3s needs to be adapted?

shoffmeister avatar May 01 '21 17:05 shoffmeister

Hi @shoffmeister , thanks for opening this issue! Interesting things you're doing here :wink: So there are several points to note here:

  • you're on cgroupv2, which currently only works with a k3d "hotfix" (see #579) and still needs to be fixed in upstream k3s (see https://github.com/k3s-io/k3s/pull/3242).
  • k3d always starts containers with --privileged
  • you have to tell k3s (inside k3d) to run rootless: `--k3s-server-arg "--rootless" --k3s-agent-arg "--rootless"

iwilltry42 avatar May 05 '21 12:05 iwilltry42

I am rather innocently naïve (AKA ruthless) when it comes to doing interesting things 😛 It's software after all, and it's running inside a VM, to top that off even more ;)

Many thanks for the input! I will revisit this issue here once the stars have aligned on the next versions of k3s, k3d.

I have taken good note of the explicit --rootless into k3s.

shoffmeister avatar May 05 '21 19:05 shoffmeister

https://rancher.com/docs/k3s/latest/en/advanced/#running-k3s-with-rootless-mode-experimental now documents steps for running k3s rootless (possibly as the result of https://github.com/k3s-io/k3s/pull/4086)

Alas, I am unable to translate the stern note

Don’t try to run k3s server --rootless on a terminal, as it doesn’t enable cgroup v2 delegation. If you really need to try it on a terminal, prepend systemd-run --user -p Delegate=yes --tty to create a systemd scope.

i.e., systemd-run --user -p Delegate=yes --tty k3s server --rootless

into something that would fit into the execution environment constructed by k3d (there is no systemd inside docker)

So, in trying to make progress on this issue here, I wonder whether it is possible at all to run k3s --rootless "inside" k3d on a rootless docker?

FWIW, I have yet to look into running k3s rootless proper.

shoffmeister avatar Dec 11 '21 21:12 shoffmeister

  • you have to tell k3s (inside k3d) to run rootless: `--k3s-server-arg "--rootless" --k3s-agent-arg "--rootless"

I don't see --k3s-server-arg and --k3s-agent-arg options for k3d cluster create. Is running in rootless Docker now supported some other way? Given that there are instructions for rootless Podman, I assumed rootless Docker would work similarly.

SanjayVas avatar Jan 10 '23 20:01 SanjayVas

I'm having problems with this too.

After enabling cpu / cpuset delegation (https://rootlesscontaine.rs/getting-started/common/cgroup2/#enabling-cpu-cpuset-and-io-delegation) I launched the cluster creation with: k3d cluster create --k3s-arg "--rootless@server:0"

I got the following message in the log: time="2023-03-21T08:43:13Z" level=fatal msg="expected sysctl value \"net.ipv4.ip_forward\" to be \"1\", got \"0\"; try adding \"net.ipv4.ip_forward=1\" to /etc/sysctl.conf and running sudo sysctl --system"

irizzant avatar Mar 21 '23 08:03 irizzant