kind icon indicating copy to clipboard operation
kind copied to clipboard

Kind can't create clusters in F35 with Podman

Open hhemied opened this issue 2 years ago • 27 comments

What happened:

[root@fedora ~]# kind create cluster
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✗ Preparing nodes 📦  
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

What you expected to happen: kind could create a cluster.

How to reproduce it (as minimally and precisely as possible):

  • install Fedora 35 server edition
  • Install podman 4.0.2
  • run kind create cluster

Environment:

  • kind version: (use kind version):
kind v0.12.0 go1.17.8 linux/amd64
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info): I use podman
Client:       Podman Engine
Version:      4.0.2
API Version:  4.0.2
Go Version:   go1.16.14

Built:      Thu Mar 10 21:26:05 2022
OS/Arch:    linux/amd64
  • OS (e.g. from /etc/os-release):
Fedora release 35 (Thirty Five)

hhemied avatar Mar 23 '22 13:03 hhemied

can you check you are using the image in the release notes?

kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9

aojea avatar Mar 23 '22 14:03 aojea

I did, and here is the output

[root@fedora ~]# kind create cluster --image kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✗ Preparing nodes 📦  
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

hhemied avatar Mar 23 '22 14:03 hhemied

can you execute adding -v 7 and paste the output?

aojea avatar Mar 23 '22 15:03 aojea

[root@fedora ~]# kind create cluster --image kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9 -v 7
enabling experimental podman provider
Creating cluster "kind" ...
DEBUG: podman/images.go:58] Image: docker.io/kindest/node@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9 present locally
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✗ Preparing nodes 📦  
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
Stack Trace: 
sigs.k8s.io/kind/pkg/errors.Errorf
	sigs.k8s.io/kind/pkg/errors/errors.go:41
sigs.k8s.io/kind/pkg/cluster/internal/providers/common.WaitUntilLogRegexpMatches
	sigs.k8s.io/kind/pkg/cluster/internal/providers/common/cgroups.go:84
sigs.k8s.io/kind/pkg/cluster/internal/providers/podman.createContainerWithWaitUntilSystemdReachesMultiUserSystem
	sigs.k8s.io/kind/pkg/cluster/internal/providers/podman/provision.go:378
sigs.k8s.io/kind/pkg/cluster/internal/providers/podman.planCreation.func2
	sigs.k8s.io/kind/pkg/cluster/internal/providers/podman/provision.go:101
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
	sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
	runtime/asm_amd64.s:1581

hhemied avatar Mar 23 '22 15:03 hhemied

It looks like above for the docker info output you actually did podman version. Can you run podman info and paste the output?

From the client version side it looks like it was compiled for amd64, but wondering if you are running on arm64. There was a similar error reported recently in slack: https://kubernetes.slack.com/archives/CEKK1KTN2/p1646907106345039?thread_ts=1646907067.117919&cid=CEKK1KTN2

stmcginnis avatar Mar 23 '22 15:03 stmcginnis

You are right, my bad.

[root@fedora ~]# podman info
host:
  arch: amd64
  buildahVersion: 1.24.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 6
  distribution:
    distribution: fedora
    version: "35"
  eventLogger: journald
  hostname: fedora
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.16.16-200.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 7363661824
  memTotal: 9196961792
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.3-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.3
      commit: 61c9600d1335127eba65632731e2d72bc3f0b9e8
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 1m 52.2s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 6
    paused: 0
    running: 0
    stopped: 6
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 5
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.2
  Built: 1646943965
  BuiltTime: Thu Mar 10 21:26:05 2022
  GitCommit: ""
  GoVersion: go1.16.14
  OsArch: linux/amd64
  Version: 4.0.2

hhemied avatar Mar 23 '22 15:03 hhemied

ok, try this

kind create cluster --retain --image kindest/node:v1.23.4@sha256:0e34f0d0fd448aa2f2819cfd74e99fe5793a6e4938b328f657c8e3f81ee0dfb9

and then kind export logs so it exports the info to a folder, create a tarball and attach it here

aojea avatar Mar 23 '22 15:03 aojea

Here is the logs folder 3808958841.zip

hhemied avatar Mar 23 '22 16:03 hhemied

I can see the log line and I can't see a 30 seconds delay ... why is it failing? something related to the stdin/stdout? @hhemied any "unusual" configuration or environment thing in your setup?

@AkihiroSuda , have you seen something similar?

aojea avatar Mar 23 '22 17:03 aojea

Nothing suspicious, actually it is clean install to test kind with podman 4 Additional info

[root@fedora ~]# podman network ls
NETWORK ID    NAME        DRIVER
faed16303522  kind        bridge
2f259bab93aa  podman      bridge

hhemied avatar Mar 23 '22 18:03 hhemied

Does podman 3 work?

aojea avatar Mar 23 '22 19:03 aojea

I am also on Fedora 35, and am affected by the same issue. I was able to create a kind cluster yesterday, but this morning I updated my kernel from 5.16.16. This kernel version appears in the report above.

If I fall back to the 5.16.15 kernel, I no longer have this issue.

FWIW, I am using the docker provider, so I suspect this issue is related in some way to the kernel, not podman. I may be wrong here, because I see nothing related in either the Fedora 5.16.16 changelog, or the upstream 5.16.16 changelog.

Although I don't have time to investigate the root cause right now, I can open a docs PR.

dlipovetsky avatar Mar 23 '22 23:03 dlipovetsky

Nope, it seems not connected to the kernel, I have tested again a fresh install F35 and skipped any update.

  • Kernel
5.14.10-300.fc35.x86_64
  • Podman version
Version:      3.4.0
API Version:  3.4.0
Go Version:   go1.16.8
Built:        Thu Sep 30 21:32:16 2021
OS/Arch:      linux/amd64
  • kind version
kind v0.12.0 go1.17.8 linux/amd64

And still geting the same error

[root@fedora ~]# kind create cluster
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✗ Preparing nodes 📦  
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

hhemied avatar Mar 24 '22 12:03 hhemied

Here is the output if I remove Podman and install Docker

kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

hhemied avatar Mar 24 '22 13:03 hhemied

Same problem happens on macOS Monterey (v12) with Apple M1:

$ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼
 ✗ Preparing nodes 📦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

$ kind version
kind v0.12.0 go1.17.8 darwin/arm64
$ docker version
Client:
 Cloud integration: v1.0.22
 Version:           20.10.13
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 10 14:08:43 2022
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Desktop 4.6.0 (75818)
 Engine:
  Version:          20.10.13
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       906f57f
  Built:            Thu Mar 10 14:05:37 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.10
  GitCommit:        2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:31:32Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.9-gke.1002", GitCommit:"f87f9d952767b966e72a4bd75afea25dea187bbf", GitTreeState:"clean", BuildDate:"2022-02-25T18:12:32Z", GoVersion:"go1.16.12b7", Compiler:"gc", Platform:"linux/amd64"}

subnetmarco avatar Mar 24 '22 23:03 subnetmarco

I am also on Fedora 35, and am affected by the same issue. I was able to create a kind cluster yesterday, but this morning I updated my kernel from 5.16.16. This kernel version appears in the report above.

If I fall back to the 5.16.15 kernel, I no longer have this issue.

FWIW, I am using the docker provider, so I suspect this issue is related in some way to the kernel, not podman. I may be wrong here, because I see nothing related in either the Fedora 5.16.16 changelog, or the upstream 5.16.16 changelog.

Although I don't have time to investigate the root cause right now, I can open a docs PR.

Could you try the latest kernel 5.16.17-200.fc35? I don't see any issue with this kernel.

my podman info

[root@fedora ~]# KIND_EXPERIMENTAL_PROVIDER=podman kind create cluster
using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼 
 ✓ Preparing nodes 📦  
 ✓ Writing configuration 📜 
 ✓ Starting control-plane 🕹️ 
 ✓ Installing CNI 🔌 
 ✓ Installing StorageClass 💾 
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

[root@fedora ~]# kind version
kind v0.12.0 go1.17.8 linux/amd64

[root@fedora ~]# podman info
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpus: 2
  distribution:
    distribution: fedora
    variant: cloud
    version: "35"
  eventLogger: journald
  hostname: fedora
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.16.17-200.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 111788032
  memTotal: 4103704576
  ociRuntime:
    name: crun
    package: crun-1.4.3-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.3
      commit: 61c9600d1335127eba65632731e2d72bc3f0b9e8
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4102287360
  swapTotal: 4103073792
  uptime: 4m 7.84s
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 1638999907
  BuiltTime: Wed Dec  8 21:45:07 2021
  GitCommit: ""
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 3.4.4


AkihiroSuda avatar Mar 30 '22 05:03 AkihiroSuda

Could you try the latest kernel 5.16.17-200.fc35? I don't see any issue with this kernel.

I tried 5.16.18-200.fc35, and I have no issues.

It seems it might still affect other systems. I'm curious what the cause is. (I suppose that systemd isn't reaching the Multi-User System target?)

dlipovetsky avatar Mar 31 '22 19:03 dlipovetsky

https://github.com/kubernetes-sigs/kind/issues/2718 tentatively seems unrelated given the odd workaround there.

Is this still an issue on current fedora kernels?

BenTheElder avatar Apr 21 '22 17:04 BenTheElder

Unfortunately, the issue still remains. I tested with the latest kernel and latest version.

hhemied avatar May 08 '22 18:05 hhemied

Just noticed this is with xfs, do we detect and mount devmapper correctly? https://github.com/kubernetes-sigs/kind/blob/dbcc39c6b8fb395863dc124fe74aca902ea7fef5/pkg/cluster/internal/providers/podman/util.go#L125

If you run create cluster with --retain it won't delete the container(s) on failure and we can inspect the node logs etc (kind export logs).

BenTheElder avatar May 09 '22 01:05 BenTheElder

Just noticed this is with xfs, do we detect and mount devmapper correctly?

We appear to in the results from https://github.com/kubernetes-sigs/kind/issues/2689#issuecomment-1076545701, /dev/mapper shows up in the volumes in the container inspect.

[[0;32m OK [0m] Reached target [0;1;39mMulti-User System[0m. is in the node logs and should have matched the regex 😕

BenTheElder avatar May 27 '22 21:05 BenTheElder

same problem here...

ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

kind version:

kind v0.14.0 go1.18.2 darwin/arm64

docker version:

Client:
 Cloud integration: v1.0.24
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:49:20 2022
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Desktop 4.8.2 (79419)
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:45:44 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

tty47 avatar May 30 '22 06:05 tty47

@jrmanes this issue is about podman and fedora, that seems to be a problem in the kernel. The one you are reported seems related to https://github.com/kubernetes-sigs/kind/issues/2718 , because you are using docker on mac with ARM architecture, please check if the environment variable is your problem as described in the linked issue.

aojea avatar May 30 '22 07:05 aojea

hello @aojea Thank you so much! Checking the other one ;)

tty47 avatar May 30 '22 07:05 tty47

The issue still exist, In my current setup,

 ➜ kind create cluster --name test --config cluster-ha-demo.yaml
using podman due to KIND_EXPERIMENTAL_PROVIDER
enabling experimental podman provider
Creating cluster "test" ...
 ✓ Ensuring node image (kindest/node:v1.24.0) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦 📦 📦
 ✓ Configuring the external load balancer ⚖️
 ✓ Writing configuration 📜
 ✗ Starting control-plane 🕹️
ERROR: failed to create cluster: failed to init node with kubeadm: command "podman exec --privileged test-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0619 17:44:28.743732     106 initconfiguration.go:255] loading configuration from "/kind/kubeadm.conf"
W0619 17:44:28.746570     106 initconfiguration.go:332] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
[init] Using Kubernetes version: v1.24.0
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0619 17:44:28.767327     106 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0619 17:44:28.965046     106 certs.go:522] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost test-control-plane test-external-load-balancer] and IPs [10.96.0.1 10.89.0.6 0.0.0.0]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0619 17:44:29.394052     106 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0619 17:44:29.568655     106 certs.go:522] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0619 17:44:29.684652     106 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0619 17:44:29.807459     106 certs.go:522] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost test-control-plane] and IPs [10.89.0.6 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost test-control-plane] and IPs [10.89.0.6 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0619 17:44:30.757394     106 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0619 17:44:30.839032     106 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0619 17:44:30.973013     106 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0619 17:44:31.214993     106 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0619 17:44:31.306636     106 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
I0619 17:44:31.510689     106 kubelet.go:65] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0619 17:44:31.851460     106 manifests.go:99] [control-plane] getting StaticPodSpecs
I0619 17:44:31.852437     106 certs.go:522] validating certificate period for CA certificate
I0619 17:44:31.853146     106 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0619 17:44:31.853961     106 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0619 17:44:31.854111     106 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0619 17:44:31.854642     106 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0619 17:44:31.855205     106 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0619 17:44:31.863183     106 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0619 17:44:31.868517     106 manifests.go:99] [control-plane] getting StaticPodSpecs
...
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
        cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:153
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1571
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:153
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:250
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1571

OS:

[root@localhost ~]# cat /etc/redhat-release
Fedora release 36 (Thirty Six)

Here is also the file system

[root@localhost ~]# lsblk -f
NAME FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sr0
vda
├─vda1
│
├─vda2
│    vfat   FAT16 EFI-SYSTEM
│                       6CEF-1B1F
├─vda3
│    ext4   1.0   boot  a71809e0-8212-4321-9c28-bc736ac25184  226.1M    29% /boot
└─vda4
     xfs          root  786374f5-cd31-4aa2-b76f-b101250fd984   95.2G     4% /var/lib/containers/storage/overlay
                                                                            /var
                                                                            /sysroot/ostree/deploy/fedora-coreos/var
                                                                            /usr
                                                                            /etc
                                                                            /
                                                                            /sysroot

This is rootful podman

hhemied avatar Jun 19 '22 17:06 hhemied

I've been out, @hhemied can you share the kind export logs from a create with --retain, also can you please try a minimal test with just kind create cluster --retain; kind export logs; kind delete cluster (Versus configuring many nodes, so we can get a minimal reproduction and not hit resource issues)

BenTheElder avatar Jul 26 '22 18:07 BenTheElder

I'm also seeing the same on Arch linux with both kind 0.13 and 0.14 with kernel 5.19.2-arch1-2. My docker dir is xfs-backed if that matters.

Logs are here kind-logs.tar.gz

glitchcrab avatar Aug 22 '22 21:08 glitchcrab

Seeing the same issue with v.1.25.3 Running on Ubuntu22.04

Creating cluster "test-cluster" ...
 ✓ Ensuring node image (kindest/node:v1.25.3) 🖼 
 ✗ Preparing nodes 📦  
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

BUT: when I removed podman and docker and re installed docker, worked like a charm

jimdevops19 avatar Oct 27 '22 11:10 jimdevops19

It's working for me now using another method I use podman machine. I can create clusters with single and multiple nodes.

hhemied avatar Nov 01 '22 15:11 hhemied

For me it looks a resource issue. I am going to close it as with more resource I can achieve what I need.

hhemied avatar Jan 13 '23 11:01 hhemied