kubespray
kubespray copied to clipboard
fix: use super-admin.conf for kube-vip on first CP
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes: Fixes #11229
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
Hi @sathieu. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Hi @sathieu , I tried your PR and I get the following error. I think the reason is that "Kube-vip | Write static pod" is run before "Set fact first_kube_control_plane".
ansible.errors.AnsibleUndefinedVariable: 'first_kube_control_plane' is undefined. 'first_kube_control_plane' is undefined
fatal: [cp-1]: FAILED! => {
"changed": false,
"msg": "AnsibleUndefinedVariable: 'first_kube_control_plane' is undefined. 'first_kube_control_plane' is undefined"
}
/ok-to-test
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: sathieu Once this PR has been reviewed and has the lgtm label, please assign floryut for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
FYI, I tried the patch and hit the same issue of 'first_kube_control_plane' not defined. I did this "hack" and was able to get this to work in a bare-metal, off-prem configuration:
diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index f7b04a624..b5acdac8c 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,10 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled
+- name: Kube-vip | Check if first control plane
+ set_fact:
+ is_first_control_plane: "{{ inventory_hostname == groups['kube_control_plane'] | first }}"
+
- name: Kube-vip | Write static pod
template:
src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..7b59bca4c 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
volumes:
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if is_first_control_plane %}super-{% endif %}admin.conf
name: kubeconfig
status: {}
FYI, I tried the patch and hit the same issue of 'first_kube_control_plane' not defined. I did this "hack" and was able to get this to work in a bare-metal, off-prem configuration:
You are not testing the last version, see https://github.com/kubernetes-sigs/kubespray/commit/0d8323789aba03e04dc8afaf6d06be9bdc87d600.
FYI, I tried the patch and hit the same issue of 'first_kube_control_plane' not defined. I did this "hack" and was able to get this to work in a bare-metal, off-prem configuration:
You are not testing the last version, see 0d83237.
0d83237 is based off of 4b82e90dc. My diff is based off of the newer 351393e32 commit on master branch. As mentioned by @andrewfung729, first_kube_control_plane is not defined, when kube-vip.manifest.j2 is processed. My change defines the fact, right before kube-vip.manifest.j2 is processed. I just used another name - is_first_control_plane.
CI is failing for probably unrelated reason, can someone take a look?
I think fixing this issue is easy, but adding proper test is harder (because we don't have an external IP available).
I tried the fix on 2 cp cluster with kubespray v2.25.0 and it's failing for the second master:
TASK [kubernetes/control-plane : Wait for k8s apiserver] ***************************************************************************************** ok: [first_cp] fatal: [second_cp]: FAILED! => {"changed": false, "elapsed": 181, "msg": "Timeout when waiting for lb-apiserver.kubernetes.local:6443"}
Am I missing anything?
EDIT: sorry, was using the wrong interface..
It worked fine and the playbook finished successfully..
@sathieu FYI, I tried the latest patch on a 3 control plane/4 worker node on-prem k8s 1.29.5 cluster using Kubespray (2 week old pull of master branch, post 2.25.0) and it worked fine! Thanks for working on this.
Can someone review and hopefully merge this PR?
@yankay @ErikJiang @floryut @tu1h @KubeKyrie (some of the reviewers of past kube-vip PRs) 🙏
Hi,
For the first installation it works fine but when I wanna test out removal of first kube-controller node I change the order of the nodes in inventory file and rerun cluster.yaml. It ran without errors but kube-vip container in new first master start crashing with
panic: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
Problem was super-admin.conf gets created only for the first run on the first master, when i changed the order of masters new first master's kube-vip manifest looked for super-admin.conf too and failed. When I changed it back to admin.conf it started working again.
I added file check task prior to Kube-vip | Write static pod task and change template accordingly, now it works on first controller node inventory order changes too.
diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index 7e3471593..4e3b3478a 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,13 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled
+- name: Kube-vip | Check if super-admin.conf exists
+ stat:
+ path: "{{ kube_config_dir }}/super-admin.conf"
+ failed_when: false
+ changed_when: false
+ register: stat_kube_vip_super_admin
+
- name: Kube-vip | Write static pod
template:
src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..8276435a1 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
volumes:
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if inventory_hostname == (groups['kube_control_plane'] | first) and stat_kube_vip_super_admin.stat.exists and stat_kube_vip_super_admin.stat.isreg %}supe
r-{% endif %}admin.conf
name: kubeconfig
status: {}
I hope you can add this to PR before it is merged.
I updated the PR inspired by @Seljuke comment https://github.com/kubernetes-sigs/kubespray/pull/11242#issuecomment-2269450109.
Can someone review it? This is a blocker for us ....
@sathieu I found a bug in my solution. When running cluster setup first time on fresh nodes kube-vip.yml static pod manifests are created before running kubeadm, so super-admin.conf can't be found and if condition on jinja template always returns admin.conf.
We gotta check if kubeadm run before and apply if logic based on that. Here is the solution covering this too;
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..b34588552 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
volumes:
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if inventory_hostname == (groups['kube_control_plane'] | first) and ((stat_kube_vip_super_admin.stat.exists and stat_kube_vip_super_admin.stat.isreg) or (not kubeadm_already_run.stat.exists )) %}super-{% endif %}admin.conf
name: kubeconfig
diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index 7e3471593..392ee4e88 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,21 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled
+- name: Kube-vip | Check if super-admin.conf exists
+ stat:
+ path: "{{ kube_config_dir }}/super-admin.conf"
+ failed_when: false
+ changed_when: false
+ register: stat_kube_vip_super_admin
+
+- name: Kube-vip | Check if kubeadm has already run
+ stat:
+ path: "/var/lib/kubelet/config.yaml"
+ get_attributes: no
+ get_checksum: no
+ get_mime: no
+ register: kubeadm_already_run
+
- name: Kube-vip | Write static pod
template:
src: manifests/kube-vip.manifest.j2
@Seljuke Could you create a PR with your changes? You have done more testing than me.
Maybe the condition should be kubeadm not already run and super-admin.conf exists (because when kubeadm has run, admin.conf should work)?
@Seljuke Could you create a PR with your changes? You have done more testing than me.
Maybe the condition should be kubeadm not already run and super-admin.conf exists (because when kubeadm has run, admin.conf should work)?
#11422 PR created. super-admin.conf gets created in kubeadm init phase kubeconfig super-admin phase and for admin.conf to work RBAC should be fully configured which wont be until first control-plane starts.
Closing in favor of https://github.com/kubernetes-sigs/kubespray/pull/11422. Thanks @Seljuke