kubeone icon indicating copy to clipboard operation
kubeone copied to clipboard

Adding a new static worker node results in a preflight check failure on existing nodes

Open xmudrii opened this issue 2 years ago • 7 comments

What happened?

Trying to add a new static worker node results in the following error:

+ sudo kubeadm init phase preflight --config=./kubeone/cfg/master_0.yaml
W0613 19:21:47.950292   27890 initconfiguration.go:331] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
W0613 19:21:47.958412   27890 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
W0613 19:21:47.958515   27890 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.20.10]
	[WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR Port-6443]: Port 6443 is in use
	[ERROR Port-10259]: Port 10259 is in use
	[ERROR Port-10257]: Port 10257 is in use
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
	[ERROR Port-10250]: Port 10250 is in use
	[ERROR Port-2379]: Port 2379 is in use
	[ERROR Port-2380]: Port 2380 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

What happens is that joining a new static worker nodes triggers the WithFullInstall workflow that's used to provision the cluster from scratch as well. There we run preflight checks with kubeadm on each node to verify that VMs satisfy requirements to be a Kubernetes node.

That works the first time we provision the cluster, but subsequent runs (e.g. when adding a new static node) are failing on existing nodes because the cluster is provisioned, so files are already created and ports are taken by Kubernetes components.

Expected behavior

  • Adding a new static worker node works as expected

How to reproduce the issue?

  • Provision the cluster
  • Try to add a new static worker node after the cluster is provisioned

What KubeOne version are you using?

Provide your KubeOneCluster manifest here (if applicable)

{
  "kubeone": {
    "major": "1",
    "minor": "6",
    "gitVersion": "v1.6.0-rc.2-36-g0536063a",
    "gitCommit": "0536063ab064601ba217c2abd41abd4c80a02477",
    "gitTreeState": "",
    "buildDate": "2023-06-13T21:16:41+02:00",
    "goVersion": "go1.20.4",
    "compiler": "gc",
    "platform": "darwin/arm64"
  },
  "machine_controller": {
    "major": "",
    "minor": "",
    "gitVersion": "8e5884837711fb0fc6b568d734f09a7b809fc28e",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

What cloud provider are you running on?

Baremetal

What operating system are you running in your cluster?

Ubuntu 20.04.6

Additional information

We can mitigate this issue by ignoring those failures, in some cases, those failures can be real issues that's going to prevent cluster from being provisioned.

xmudrii avatar Jun 13 '23 19:06 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Sep 11 '23 23:09 kubermatic-bot

/remove-lifecycle stale

xmudrii avatar Sep 12 '23 08:09 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Dec 11 '23 11:12 kubermatic-bot

/remove-lifecycle stale

xmudrii avatar Dec 11 '23 11:12 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Apr 08 '24 00:04 kubermatic-bot

/remove-lifecycle stale

xmudrii avatar Apr 08 '24 11:04 xmudrii

I'm not working on this at the moment. /unassign

xmudrii avatar Aug 27 '24 13:08 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Nov 25 '24 14:11 kubermatic-bot

/remove-lifecycle stale

xmudrii avatar Nov 26 '24 18:11 xmudrii

I actually can not reproduce this problem. I've tried to add new worker. I've tried to replace first out of 2. Then second out of 2. Then added third one.

/close

kron4eg avatar Jan 15 '25 16:01 kron4eg

@kron4eg: Closing this issue.

In response to this:

I actually can not reproduce this problem. I've tried to add new worker. I've tried to replace first out of 2. Then second out of 2. Then added third one.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubermatic-bot avatar Jan 15 '25 16:01 kubermatic-bot

@kron4eg The issue has been mitigated in a very hacky way by ignoring a lot of preflight checks: https://github.com/kubermatic/kubeone/pull/2803

We should revisit this and try to fix it in some not-so-hacky manner.

/reopen

xmudrii avatar Jan 20 '25 17:01 xmudrii

@xmudrii: Reopened this issue.

In response to this:

@kron4eg The issue has been mitigated in a very hacky way by ignoring a lot of preflight checks: https://github.com/kubermatic/kubeone/pull/2803

We should revisit this and try to fix it in some not-so-hacky manner.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubermatic-bot avatar Jan 20 '25 17:01 kubermatic-bot

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Apr 21 '25 02:04 kubermatic-bot

/remove-lifecycle stale

xmudrii avatar Apr 22 '25 11:04 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Jul 29 '25 02:07 kubermatic-bot

/remove-lifecycle stale

xmudrii avatar Jul 29 '25 10:07 xmudrii

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar Nov 18 '25 14:11 kubermatic-bot

/lifecycle frozen

xmudrii avatar Nov 19 '25 10:11 xmudrii