kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled

Open Mazorius opened this issue 1 year ago • 6 comments
trafficstars

What happened?

Running an initial cluster creation breaks always on registering first master if kube-vip is enabled.

What did you expect to happen?

In the initial phase kube-vip does not block the registration of the first control-plane.

How can we reproduce it (as minimally and precisely as possible)?

Deploy a minimal cluster in a fresh environment and activate kube-vip beforehand via addons.yml.

OS

Linux 5.15.0-102-generic x86_64 PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.16.7] config file = ansible.cfg configured module search path = ['library'] ansible python module location = venv/lib/python3.12/site-packages/ansible ansible collection location = /Users/****/.ansible/collections:/usr/share/ansible/collections:/etc/ansible/collections:collections executable location = venv/bin/ansible python version = 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] (venv/bin/python) jinja version = 3.1.4 libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

Collection (2.25.0)

Network plugin used

calico

Full inventory with variables

all: children: bastion: hosts: bastion: ansible_host: 10.12.3.61 ip: 10.12.3.61 kube_control_plane: hosts: hk8scpfra1: ansible_host: 10.12.3.11 ip: 10.12.3.11 hk8scpfra2: ansible_host: 10.12.3.12 ip: 10.12.3.12 hk8scpfra3: ansible_host: 10.12.3.13 ip: 10.12.3.13 worker_node: hosts: hk8swfra1: ansible_host: 10.12.3.21 ip: 10.12.3.21 hk8swfra2: ansible_host: 10.12.3.22 ip: 10.12.3.22 hk8swfra3: ansible_host: 10.12.3.23 ip: 10.12.3.23 vars: node_labels: node-role.kubernetes.io/worker: "" node.cluster.x-k8s.io/nodegroup: worker database_node: hosts: hk8sdbfra1: ansible_host: 10.12.3.31 ip: 10.12.3.31 hk8sdbfra2: ansible_host: 10.12.3.32 ip: 10.12.3.32 hk8sdbfra3: ansible_host: 10.12.3.33 ip: 10.12.3.33 vars: node_taints: - 'dedicated=database:NoSchedule' node_labels: node-role.kubernetes.io/database: "" node.cluster.x-k8s.io/nodegroup: database monitor_node: hosts: hk8smfra1: ansible_host: 10.12.3.41 ip: 10.12.3.41 hk8smfra2: ansible_host: 10.12.3.42 ip: 10.12.3.42 hk8smfra3: ansible_host: 10.12.3.43 ip: 10.12.3.43 vars: node_taints: - 'dedicated=monitor:NoSchedule' node_labels: node-role.kubernetes.io/monitor: "" node.cluster.x-k8s.io/nodegroup: monitor teleport_node: hosts: hk8stfra1: ansible_host: 10.12.3.51 ip: 10.12.3.51 hk8stfra2: ansible_host: 10.12.3.52 ip: 10.12.3.52 hk8stfra3: ansible_host: 10.12.3.53 ip: 10.12.3.53 vars: node_taints: - 'dedicated=teleport:NoSchedule' node_labels: node-role.kubernetes.io/teleport: "" node.cluster.x-k8s.io/nodegroup: teleport k8s_cluster: children: kube_control_plane: worker_node: database_node: monitor_node: teleport_node: etcd: children: kube_control_plane: kube_node: children: worker_node: database_node: monitor_node: teleport_node: calico_rr: hosts: {}

Command used to invoke ansible

ansible-playbook --inventory inventory-local.yml --become --become-user=root --private-key=~/.ssh/key_2024-04-10 cluster.yml

Output of ansible run

kubeadm | Initialize first master

failed

Anything else we need to know

Kubelet log shows connection timeout to Apiserver endpoint.

Mazorius avatar May 22 '24 20:05 Mazorius

Same for me. On a fresh cluster deployment if kube-vip it's enabled the deployment fails. Variables used in setting up kube-vip:

# Kube VIP
kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: "{{ hostvars[groups['kube_control_plane'][0]]['virtual_ip_addresses'][0] }}" # evaluates to an IP
loadbalancer_apiserver:
  address: "{{ kube_vip_address }}"
  port: 6443
kube_vip_interface: ens192
kube_vip_services_enabled: false
kube_vip_dns_mode: first
kube_vip_cp_detect: false
kube_vip_leasename: plndr-cp-lock
kube_vip_enable_node_labeling: true
kube_vip_lb_enable: true

This are the logs from the kube-vip container:

E0526 16:28:30.201192       1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

E0526 16:28:32.303578       1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

And this from the journaclt:

May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: I0526 16:30:11.764300   14529 csi_plugin.go:880] Failed to contact API server when waiting for CSINode publishing: Get "https://lb-apiserver.kubernetes.local:6443/apis/storage.k8s.io/v1/csinodes/k8s-g1-cplane-1-56a631.example.com": dial tcp 172.19.20.99:6443: connect: no route to host

May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: W0526 16:30:11.764339   14529 reflector.go:539] k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Node: Get "https://lb-apiserver.kubernetes.local:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-g1-cplane-1-56a631.example.com&limit=500&resourceVersion=0": dial tcp 172.19.20.99:6443: connect: no route to host

I redacted the domain with example.com

Workaround:

  1. Deploy a fresh cluster without kube-vip the deployed succeeds.
  2. Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.

vladvetu avatar May 26 '24 16:05 vladvetu

Same issue for me.

raider444 avatar May 28 '24 08:05 raider444

kube-vip requires workarounds to support k8s v1.29+

wandersonlima avatar May 28 '24 12:05 wandersonlima

I would be great to add kube-vip to the test matrix also ...

sathieu avatar May 28 '24 15:05 sathieu

Proposed PR: https://github.com/kubernetes-sigs/kubespray/pull/11242

sathieu avatar May 28 '24 16:05 sathieu

Workaround:

1. Deploy a fresh cluster without kube-vip the deployed succeeds.

2. Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.

Thank you for saving my sanity!

theLockesmith avatar Jun 26 '24 19:06 theLockesmith

I edited roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 in the kubespray docker like in the PR and it worked ok. Thanks!

vtmocanu avatar Jul 08 '24 15:07 vtmocanu

Quoting https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2303984047:

Without ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet wanted to bootstrap itself by using the control-plane IP

* This failed until kube-vip comes up

* The kubelet can't start kube-vip because the `admin.conf` does not yet exist

With ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up)

* The admin.conf gets created

* The kubelet should be able to start kube-vip now

So, a better solution is available now.

sathieu avatar Aug 22 '24 07:08 sathieu

Quoting kube-vip/kube-vip#684 (comment):

Without ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet wanted to bootstrap itself by using the control-plane IP

* This failed until kube-vip comes up

* The kubelet can't start kube-vip because the `admin.conf` does not yet exist

With ControlPlaneKubeletLocalMode and when referncing admin.conf for kube-vip:

* The kubelet got started

* The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up)

* The admin.conf gets created

* The kubelet should be able to start kube-vip now

So, a better solution is available now.

Nope. This isn't working. This 684 is still any issue

https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2309781000 https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2310284394

Cloud-Mak avatar Sep 03 '24 19:09 Cloud-Mak