kubespray
kubespray copied to clipboard
Initial setup of a k8s cluster with kubespray breaks if kube-vip is enabled
What happened?
Running an initial cluster creation breaks always on registering first master if kube-vip is enabled.
What did you expect to happen?
In the initial phase kube-vip does not block the registration of the first control-plane.
How can we reproduce it (as minimally and precisely as possible)?
Deploy a minimal cluster in a fresh environment and activate kube-vip beforehand via addons.yml.
OS
Linux 5.15.0-102-generic x86_64 PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
Version of Ansible
ansible [core 2.16.7] config file = ansible.cfg configured module search path = ['library'] ansible python module location = venv/lib/python3.12/site-packages/ansible ansible collection location = /Users/****/.ansible/collections:/usr/share/ansible/collections:/etc/ansible/collections:collections executable location = venv/bin/ansible python version = 3.12.3 (main, Apr 9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] (venv/bin/python) jinja version = 3.1.4 libyaml = True
Version of Python
Python 3.12.3
Version of Kubespray (commit)
Collection (2.25.0)
Network plugin used
calico
Full inventory with variables
all: children: bastion: hosts: bastion: ansible_host: 10.12.3.61 ip: 10.12.3.61 kube_control_plane: hosts: hk8scpfra1: ansible_host: 10.12.3.11 ip: 10.12.3.11 hk8scpfra2: ansible_host: 10.12.3.12 ip: 10.12.3.12 hk8scpfra3: ansible_host: 10.12.3.13 ip: 10.12.3.13 worker_node: hosts: hk8swfra1: ansible_host: 10.12.3.21 ip: 10.12.3.21 hk8swfra2: ansible_host: 10.12.3.22 ip: 10.12.3.22 hk8swfra3: ansible_host: 10.12.3.23 ip: 10.12.3.23 vars: node_labels: node-role.kubernetes.io/worker: "" node.cluster.x-k8s.io/nodegroup: worker database_node: hosts: hk8sdbfra1: ansible_host: 10.12.3.31 ip: 10.12.3.31 hk8sdbfra2: ansible_host: 10.12.3.32 ip: 10.12.3.32 hk8sdbfra3: ansible_host: 10.12.3.33 ip: 10.12.3.33 vars: node_taints: - 'dedicated=database:NoSchedule' node_labels: node-role.kubernetes.io/database: "" node.cluster.x-k8s.io/nodegroup: database monitor_node: hosts: hk8smfra1: ansible_host: 10.12.3.41 ip: 10.12.3.41 hk8smfra2: ansible_host: 10.12.3.42 ip: 10.12.3.42 hk8smfra3: ansible_host: 10.12.3.43 ip: 10.12.3.43 vars: node_taints: - 'dedicated=monitor:NoSchedule' node_labels: node-role.kubernetes.io/monitor: "" node.cluster.x-k8s.io/nodegroup: monitor teleport_node: hosts: hk8stfra1: ansible_host: 10.12.3.51 ip: 10.12.3.51 hk8stfra2: ansible_host: 10.12.3.52 ip: 10.12.3.52 hk8stfra3: ansible_host: 10.12.3.53 ip: 10.12.3.53 vars: node_taints: - 'dedicated=teleport:NoSchedule' node_labels: node-role.kubernetes.io/teleport: "" node.cluster.x-k8s.io/nodegroup: teleport k8s_cluster: children: kube_control_plane: worker_node: database_node: monitor_node: teleport_node: etcd: children: kube_control_plane: kube_node: children: worker_node: database_node: monitor_node: teleport_node: calico_rr: hosts: {}
Command used to invoke ansible
ansible-playbook --inventory inventory-local.yml --become --become-user=root --private-key=~/.ssh/key_2024-04-10 cluster.yml
Output of ansible run
kubeadm | Initialize first master
failed
Anything else we need to know
Kubelet log shows connection timeout to Apiserver endpoint.
Same for me. On a fresh cluster deployment if kube-vip it's enabled the deployment fails. Variables used in setting up kube-vip:
# Kube VIP
kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: "{{ hostvars[groups['kube_control_plane'][0]]['virtual_ip_addresses'][0] }}" # evaluates to an IP
loadbalancer_apiserver:
address: "{{ kube_vip_address }}"
port: 6443
kube_vip_interface: ens192
kube_vip_services_enabled: false
kube_vip_dns_mode: first
kube_vip_cp_detect: false
kube_vip_leasename: plndr-cp-lock
kube_vip_enable_node_labeling: true
kube_vip_lb_enable: true
This are the logs from the kube-vip container:
E0526 16:28:30.201192 1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
E0526 16:28:32.303578 1 leaderelection.go:332] error retrieving resource lock kube-system/plndr-cp-lock: leases.coordination.k8s.io "plndr-cp-lock" is forbidden: User "kubernetes-admin" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"
And this from the journaclt:
May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: I0526 16:30:11.764300 14529 csi_plugin.go:880] Failed to contact API server when waiting for CSINode publishing: Get "https://lb-apiserver.kubernetes.local:6443/apis/storage.k8s.io/v1/csinodes/k8s-g1-cplane-1-56a631.example.com": dial tcp 172.19.20.99:6443: connect: no route to host
May 26 16:30:11 k8s-g1-cplane-1-56a631.example.com kubelet[14529]: W0526 16:30:11.764339 14529 reflector.go:539] k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Node: Get "https://lb-apiserver.kubernetes.local:6443/api/v1/nodes?fieldSelector=metadata.name%3Dk8s-g1-cplane-1-56a631.example.com&limit=500&resourceVersion=0": dial tcp 172.19.20.99:6443: connect: no route to host
I redacted the domain with example.com
Workaround:
- Deploy a fresh cluster without kube-vip the deployed succeeds.
- Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.
Same issue for me.
kube-vip requires workarounds to support k8s v1.29+
I would be great to add kube-vip to the test matrix also ...
Proposed PR: https://github.com/kubernetes-sigs/kubespray/pull/11242
Workaround:
1. Deploy a fresh cluster without kube-vip the deployed succeeds. 2. Enable kube-vip and re-run cluster.yml the cluster deployment succeeds and kube-vip works as expected.
Thank you for saving my sanity!
I edited roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 in the kubespray docker like in the PR and it worked ok.
Thanks!
Quoting https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2303984047:
Without
ControlPlaneKubeletLocalModeand when referncingadmin.conffor kube-vip:* The kubelet got started * The kubelet wanted to bootstrap itself by using the control-plane IP * This failed until kube-vip comes up * The kubelet can't start kube-vip because the `admin.conf` does not yet existWith
ControlPlaneKubeletLocalModeand when referncingadmin.conffor kube-vip:* The kubelet got started * The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up) * The admin.conf gets created * The kubelet should be able to start kube-vip now
So, a better solution is available now.
Quoting kube-vip/kube-vip#684 (comment):
Without
ControlPlaneKubeletLocalModeand when referncingadmin.conffor kube-vip:* The kubelet got started * The kubelet wanted to bootstrap itself by using the control-plane IP * This failed until kube-vip comes up * The kubelet can't start kube-vip because the `admin.conf` does not yet existWith
ControlPlaneKubeletLocalModeand when referncingadmin.conffor kube-vip:* The kubelet got started * The kubelet bootstraps itself using the local control-plane IP (not depending on kube-vip being up) * The admin.conf gets created * The kubelet should be able to start kube-vip nowSo, a better solution is available now.
Nope. This isn't working. This 684 is still any issue
https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2309781000 https://github.com/kube-vip/kube-vip/issues/684#issuecomment-2310284394