kubeone Adding a new static worker node results in a preflight check failure on existing nodes

What happened?

Trying to add a new static worker node results in the following error:

+ sudo kubeadm init phase preflight --config=./kubeone/cfg/master_0.yaml
W0613 19:21:47.950292   27890 initconfiguration.go:331] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
W0613 19:21:47.958412   27890 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
W0613 19:21:47.958515   27890 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.20.10]
	[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR Port-6443]: Port 6443 is in use
	[ERROR Port-10259]: Port 10259 is in use
	[ERROR Port-10257]: Port 10257 is in use
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
	[ERROR Port-10250]: Port 10250 is in use
	[ERROR Port-2379]: Port 2379 is in use
	[ERROR Port-2380]: Port 2380 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

What happens is that joining a new static worker nodes triggers the WithFullInstall workflow that's used to provision the cluster from scratch as well. There we run preflight checks with kubeadm on each node to verify that VMs satisfy requirements to be a Kubernetes node.

That works the first time we provision the cluster, but subsequent runs (e.g. when adding a new static node) are failing on existing nodes because the cluster is provisioned, so files are already created and ports are taken by Kubernetes components.

Expected behavior

Adding a new static worker node works as expected

How to reproduce the issue?

Provision the cluster
Try to add a new static worker node after the cluster is provisioned

What KubeOne version are you using?

Provide your KubeOneCluster manifest here (if applicable)

{
  "kubeone": {
    "major": "1",
    "minor": "6",
    "gitVersion": "v1.6.0-rc.2-36-g0536063a",
    "gitCommit": "0536063ab064601ba217c2abd41abd4c80a02477",
    "gitTreeState": "",
    "buildDate": "2023-06-13T21:16:41+02:00",
    "goVersion": "go1.20.4",
    "compiler": "gc",
    "platform": "darwin/arm64"
  },
  "machine_controller": {
    "major": "",
    "minor": "",
    "gitVersion": "8e5884837711fb0fc6b568d734f09a7b809fc28e",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

What cloud provider are you running on?

Baremetal

What operating system are you running in your cluster?

Ubuntu 20.04.6

Additional information

We can mitigate this issue by ignoring those failures, in some cases, those failures can be real issues that's going to prevent cluster from being provisioned.

Jun 13 '23 19:06 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Sep 11 '23 23:09 kubermatic-bot

/remove-lifecycle stale

Sep 12 '23 08:09 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Dec 11 '23 11:12 kubermatic-bot

/remove-lifecycle stale

Dec 11 '23 11:12 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Apr 08 '24 00:04 kubermatic-bot

/remove-lifecycle stale

Apr 08 '24 11:04 xmudrii

I'm not working on this at the moment. /unassign

Aug 27 '24 13:08 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Nov 25 '24 14:11 kubermatic-bot

/remove-lifecycle stale

Nov 26 '24 18:11 xmudrii

I actually can not reproduce this problem. I've tried to add new worker. I've tried to replace first out of 2. Then second out of 2. Then added third one.

/close

Jan 15 '25 16:01 kron4eg

@kron4eg: Closing this issue.

In response to this:

I actually can not reproduce this problem. I've tried to add new worker. I've tried to replace first out of 2. Then second out of 2. Then added third one.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jan 15 '25 16:01 kubermatic-bot

@kron4eg The issue has been mitigated in a very hacky way by ignoring a lot of preflight checks: https://github.com/kubermatic/kubeone/pull/2803

We should revisit this and try to fix it in some not-so-hacky manner.

/reopen

Jan 20 '25 17:01 xmudrii

@xmudrii: Reopened this issue.

In response to this:

@kron4eg The issue has been mitigated in a very hacky way by ignoring a lot of preflight checks: https://github.com/kubermatic/kubeone/pull/2803

We should revisit this and try to fix it in some not-so-hacky manner.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jan 20 '25 17:01 kubermatic-bot

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Apr 21 '25 02:04 kubermatic-bot

/remove-lifecycle stale

Apr 22 '25 11:04 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Jul 29 '25 02:07 kubermatic-bot

/remove-lifecycle stale

Jul 29 '25 10:07 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Nov 18 '25 14:11 kubermatic-bot

/lifecycle frozen

Nov 19 '25 10:11 xmudrii

kubeone kubeone copied to clipboard

Adding a new static worker node results in a preflight check failure on existing nodes

What happened?

Expected behavior

How to reproduce the issue?

What KubeOne version are you using?

Provide your KubeOneCluster manifest here (if applicable)

What cloud provider are you running on?

What operating system are you running in your cluster?

Additional information

kubeone
kubeone copied to clipboard