kubeone
kubeone copied to clipboard
Adding a new static worker node results in a preflight check failure on existing nodes
What happened?
Trying to add a new static worker node results in the following error:
+ sudo kubeadm init phase preflight --config=./kubeone/cfg/master_0.yaml
W0613 19:21:47.950292 27890 initconfiguration.go:331] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
W0613 19:21:47.958412 27890 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
W0613 19:21:47.958515 27890 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.20.10]
[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
What happens is that joining a new static worker nodes triggers the WithFullInstall workflow that's used to provision the cluster from scratch as well. There we run preflight checks with kubeadm on each node to verify that VMs satisfy requirements to be a Kubernetes node.
That works the first time we provision the cluster, but subsequent runs (e.g. when adding a new static node) are failing on existing nodes because the cluster is provisioned, so files are already created and ports are taken by Kubernetes components.
Expected behavior
- Adding a new static worker node works as expected
How to reproduce the issue?
- Provision the cluster
- Try to add a new static worker node after the cluster is provisioned
What KubeOne version are you using?
Provide your KubeOneCluster manifest here (if applicable)
{
"kubeone": {
"major": "1",
"minor": "6",
"gitVersion": "v1.6.0-rc.2-36-g0536063a",
"gitCommit": "0536063ab064601ba217c2abd41abd4c80a02477",
"gitTreeState": "",
"buildDate": "2023-06-13T21:16:41+02:00",
"goVersion": "go1.20.4",
"compiler": "gc",
"platform": "darwin/arm64"
},
"machine_controller": {
"major": "",
"minor": "",
"gitVersion": "8e5884837711fb0fc6b568d734f09a7b809fc28e",
"gitCommit": "",
"gitTreeState": "",
"buildDate": "",
"goVersion": "",
"compiler": "",
"platform": "linux/amd64"
}
}
What cloud provider are you running on?
Baremetal
What operating system are you running in your cluster?
Ubuntu 20.04.6
Additional information
We can mitigate this issue by ignoring those failures, in some cases, those failures can be real issues that's going to prevent cluster from being provisioned.
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
I'm not working on this at the moment. /unassign
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
I actually can not reproduce this problem. I've tried to add new worker. I've tried to replace first out of 2. Then second out of 2. Then added third one.
/close
@kron4eg: Closing this issue.
In response to this:
I actually can not reproduce this problem. I've tried to add new worker. I've tried to replace first out of 2. Then second out of 2. Then added third one.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
@kron4eg The issue has been mitigated in a very hacky way by ignoring a lot of preflight checks: https://github.com/kubermatic/kubeone/pull/2803
We should revisit this and try to fix it in some not-so-hacky manner.
/reopen
@xmudrii: Reopened this issue.
In response to this:
@kron4eg The issue has been mitigated in a very hacky way by ignoring a lot of preflight checks: https://github.com/kubermatic/kubeone/pull/2803
We should revisit this and try to fix it in some not-so-hacky manner.
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/lifecycle frozen