kubespray Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled

trafficstars

What happened?

Running an initial cluster creation breaks always on registering first master if kube-vip is enabled.

What did you expect to happen?

In the initial phase kube-vip does not block the registration of the first control-plane.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a minimal cluster in a fresh environment and activate kube-vip beforehand via addons.yml.

OS

Linux 5.15.0-102-generic x86_64 PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.16.7] config file = ansible.cfg configured module search path = ['library'] ansible python module location = venv/lib/python3.12/site-packages/ansible ansible collection location = /Users/****/.ansible/collections:/usr/share/ansible/collections:/etc/ansible/collections:collections executable location = venv/bin/ansible python version = 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] (venv/bin/python) jinja version = 3.1.4 libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

Collection (2.25.0)

Network plugin used

calico

Full inventory with variables

all: children: bastion: hosts: bastion: ansible_host: 10.12.3.61 ip: 10.12.3.61 kube_control_plane: hosts: hk8scpfra1: ansible_host: 10.12.3.11 ip: 10.12.3.11 hk8scpfra2: ansible_host: 10.12.3.12 ip: 10.12.3.12 hk8scpfra3: ansible_host: 10.12.3.13 ip: 10.12.3.13 worker_node: hosts: hk8swfra1: ansible_host: 10.12.3.21 ip: 10.12.3.21 hk8swfra2: ansible_host: 10.12.3.22 ip: 10.12.3.22 hk8swfra3: ansible_host: 10.12.3.23 ip: 10.12.3.23 vars: node_labels: node-role.kubernetes.io/worker: "" node.cluster.x-k8s.io/nodegroup: worker database_node: hosts: hk8sdbfra1: ansible_host: 10.12.3.31 ip: 10.12.3.31 hk8sdbfra2: ansible_host: 10.12.3.32 ip: 10.12.3.32 hk8sdbfra3: ansible_host: 10.12.3.33 ip: 10.12.3.33 vars: node_taints: - 'dedicated=database:NoSchedule' node_labels: node-role.kubernetes.io/database: "" node.cluster.x-k8s.io/nodegroup: database monitor_node: hosts: hk8smfra1: ansible_host: 10.12.3.41 ip: 10.12.3.41 hk8smfra2: ansible_host: 10.12.3.42 ip: 10.12.3.42 hk8smfra3: ansible_host: 10.12.3.43 ip: 10.12.3.43 vars: node_taints: - 'dedicated=monitor:NoSchedule' node_labels: node-role.kubernetes.io/monitor: "" node.cluster.x-k8s.io/nodegroup: monitor teleport_node: hosts: hk8stfra1: ansible_host: 10.12.3.51 ip: 10.12.3.51 hk8stfra2: ansible_host: 10.12.3.52 ip: 10.12.3.52 hk8stfra3: ansible_host: 10.12.3.53 ip: 10.12.3.53 vars: node_taints: - 'dedicated=teleport:NoSchedule' node_labels: node-role.kubernetes.io/teleport: "" node.cluster.x-k8s.io/nodegroup: teleport k8s_cluster: children: kube_control_plane: worker_node: database_node: monitor_node: teleport_node: etcd: children: kube_control_plane: kube_node: children: worker_node: database_node: monitor_node: teleport_node: calico_rr: hosts: {}

Command used to invoke ansible

ansible-playbook --inventory inventory-local.yml --become --become-user=root --private-key=~/.ssh/key_2024-04-10 cluster.yml

Output of ansible run

kubeadm | Initialize first master

failed

Anything else we need to know

Kubelet log shows connection timeout to Apiserver endpoint.

May 22 '24 20:05 Mazorius

Same for me. On a fresh cluster deployment if kube-vip it's enabled the deployment fails. Variables used in setting up kube-vip:

# Kube VIP
kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: "{{ hostvars[groups['kube_control_plane'][0]]['virtual_ip_addresses'][0] }}" # evaluates to an IP
loadbalancer_apiserver:
  address: "{{ kube_vip_address }}"
  port: 6443
kube_vip_interface: ens192
kube_vip_services_enabled: false
kube_vip_dns_mode: first
kube_vip_cp_detect: false
kube_vip_leasename: plndr-cp-lock
kube_vip_enable_node_labeling: true
kube_vip_lb_enable: true

This are the logs from the kube-vip container:

E0526 16:28:30.201192       1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

E0526 16:28:32.303578       1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

And this from the journaclt:

May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: I0526 16:30:11.764300   14529 csi_plugin.go:880] Failed to contact API server when waiting for CSINode publishing: Get "https://lb-apiserver.kubernetes.local:6443/apis/storage.k8s.io/v1/csinodes/k8s-g1-cplane-1-56a631.example.com": dial tcp 172.19.20.99:6443: connect: no route to host

May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: W0526 16:30:11.764339   14529 reflector.go:539] k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Node: Get "https://lb-apiserver.kubernetes.local:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-g1-cplane-1-56a631.example.com&limit=500&resourceVersion=0": dial tcp 172.19.20.99:6443: connect: no route to host

I redacted the domain with example.com

Workaround:

Deploy a fresh cluster without kube-vip the deployed succeeds.
Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.

May 26 '24 16:05 vladvetu

Same issue for me.

May 28 '24 08:05 raider444

kube-vip requires workarounds to support k8s v1.29+

kube-vip requires super-admin.conf with Kubernetes 1.29 kube-vip/kube-vip#684

May 28 '24 12:05 wandersonlima

I would be great to add kube-vip to the test matrix also ...

May 28 '24 15:05 sathieu

Proposed PR: https://github.com/kubernetes-sigs/kubespray/pull/11242

May 28 '24 16:05 sathieu

Workaround:

1. Deploy a fresh cluster without kube-vip the deployed succeeds.

2. Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.

Thank you for saving my sanity!

Jun 26 '24 19:06 theLockesmith

I edited roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 in the kubespray docker like in the PR and it worked ok. Thanks!

Jul 08 '24 15:07 vtmocanu

Quoting https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2303984047:

Without ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet wanted to bootstrap itself by using the control-plane IP

* This failed until kube-vip comes up

* The kubelet can't start kube-vip because the `admin.conf` does not yet exist

With ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up)

* The admin.conf gets created

* The kubelet should be able to start kube-vip now

So, a better solution is available now.

Aug 22 '24 07:08 sathieu

Quoting kube-vip/kube-vip#684 (comment):
Without ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:
* The kubelet got started

* The kubelet wanted to bootstrap itself by using the control-plane IP

* This failed until kube-vip comes up

* The kubelet can't start kube-vip because the `admin.conf` does not yet exist
With ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:
* The kubelet got started

* The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up)

* The admin.conf gets created

* The kubelet should be able to start kube-vip now
So, a better solution is available now.

Nope. This isn't working. This 684 is still any issue

https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2309781000 https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2310284394

Sep 03 '24 19:09 Cloud-Mak

kubespray kubespray copied to clipboard

Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know

kubespray
kubespray copied to clipboard