kind
kind copied to clipboard
Join of worker node fails for specific k8s versions
Unsure if this is a kind problem at all or more a kubeadm problem or a known problem with those specific k8s versions. But here goes.
What happened:
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged kind-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
What you expected to happen: Cluster to be created successfully
How to reproduce it (as minimally and precisely as possible): Create a cluster with this config:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.24.0
- role: worker
image: kindest/node:v1.23.4
- role: worker
image: kindest/node:v1.24.0
Anything else we need to know?: It works if I use a different worker image and having the other nodes fallback to the standard node image from kind v0.16.0
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
image: kindest/node:v1.24.6@sha256:97e8d00bc37a7598a0b32d1fabd155a96355c49fa0d4d4790aab0f161bf31be1
- role: worker
Environment: Fails as early as kind v0.15.0 from what I saw and still present in v0.16.0.
- kind version: (use
kind version
): kind v0.16.0 - Kubernetes version: (use
kubectl version
): v1.25.0 - Docker version: (use
docker info
): docker desktop v4.12.0 - OS (e.g. from
/etc/os-release
): Windows, as well as Github Actions runners
1.23.4 is a very old node image from v0.12.0 kind release, please use an image from the current release (or build your own with the current release to get arbitrary k8s patch versions)
https://kind.sigs.k8s.io/docs/user/quick-start/#creating-a-cluster
Prebuilt images are hosted atkindest/node, but to find images suitable for a given release currently you should check the release notes for your given kind version (check with kind version) where you'll find a complete listing of images created for a kind release.
Understood, using newer images already just found it weird that one of my old tests was failing and thought it might be worth reporting.
ACK thanks, please let me know if you see this with current patch versions, it's definitely possible there's a k8s bug. We also changed some things around v0.13 that might cause issues and we'll probably have to stop supporting old images again in the future related to cgroups management 😅
I actually found it was still working in v0.14 as well, v0.15 must be the first version it failed.
But does it fail with current node images? There's been a lot of PRs to Kubernetes 1.23 / 1.24 since 1.23.4 / 1.24.0
Nope like I said in the initial report, e.g. this config works:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
image: kindest/node:v1.24.6@sha256:97e8d00bc37a7598a0b32d1fabd155a96355c49fa0d4d4790aab0f161bf31be1
- role: worker
which would pull in v0.16.0 node images for control plane and workers, except the hardcoded one.
Unless you mean the 1.23/1.24 images, I'll have to try that out right now.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.24.6@sha256:97e8d00bc37a7598a0b32d1fabd155a96355c49fa0d4d4790aab0f161bf31be1
- role: worker
image: kindest/node:v1.23.12@sha256:9402cf1330bbd3a0d097d2033fa489b2abe40d479cc5ef47d0b6a6960613148a
- role: worker
image: kindest/node:v1.24.6@sha256:97e8d00bc37a7598a0b32d1fabd155a96355c49fa0d4d4790aab0f161bf31be1
This config works without problems.
Thanks -- this makes me think there's a kubernetes bug that was fixed between 1.23.4 ... 1.23.12 or 1.24.0 .. 1.24.6. It's possible instead that images from past kind releases have a bug related to this but I can't think of a relevant change.
@tehcyx I faced a similar problem while spawning a cluster with more than three workers. Kind worked perfectly for less than three workers in my case. Tried with 1.23, 1.24. 1.25 and even tried building the node image using k8s latest source but the problem persisted.
After inspecting kubelet logs found too many open files to be the issue to be the issue raised by inotify.
Later found this in known issues as well. Can you increase the limits as mentioned in the following link: https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files
Yeah, I have reason to suspect recent Kubernetes releases increased the number of inotify watches used on a typical kind cluster, based on issue reports here, but haven't had time to investigate this (also it's not clear that would be considered a bug in Kubernetes), unfortunately since that limit is not namespaced we do not touch it, but it's a common issue with multi-node KIND clusters in general.
I think this was some Kubernetes bug and there's not much more for us to do here for now