kubespray
kubespray copied to clipboard
Adding more control nodes fails when kube_apiserver_bind_address is set
Hi, Environment:
-
bare-metal
-
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
-
Version of Ansible (
ansible --version
):
ansible 2.10.11
config file = ~/kubespray/ansible.cfg
configured module search path = ['~/kubespray/library']
ansible python module location = ~/kubespray/env/lib/python3.9/site-packages/ansible
executable location = ~/kubespray/env/bin/ansible
python version = 3.9.5 (default, May 11 2021, 08:20:37) [GCC 10.3.0]
-
Version of Python (
python --version
): Python 3.9.5
Kubespray version (commit) (git rev-parse --short HEAD
):
bcf69591 (2.16.0 release)
Network plugin used: Calico
Anything else do we need to know: Hitting a situation where one master is already running and I want to add additional 2 masters. Each should have all control plane components locked to their respective private IP addresses (as it is multi-homed scenario) with:
kube_apiserver_bind_address: 10.1.10.X
but it is failing on a step (it just simply times out):
TASK [kubernetes/control-plane : Wait for k8s apiserver
The reason for this is that kubeadm_discovery_address
will be set from kube_apiserver_endpoint
:
kube_apiserver_endpoint: |-
{% if loadbalancer_apiserver is defined -%}
https://{{ apiserver_loadbalancer_domain_name }}:{{ loadbalancer_apiserver.port|default(kube_apiserver_port) }}
{%- elif not is_kube_master and loadbalancer_apiserver_localhost -%}
https://localhost:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }}
{%- elif is_kube_master -%}
### <-- It will end up here for each new master, which will be 10.1.10.X (node's own IP)
### <-- at this point, kubelet is not able to start and spin up containers since /etc/kubernetes/ssl is empty and various other reason
https://{{ kube_apiserver_bind_address | regex_replace('0\.0\.0\.0','127.0.0.1') }}:{{ kube_apiserver_port }}
{%- else -%}
### <-- it should end up here instead so new masters will properly fetch SSL certs from already running master node
### <-- and start TLS bootstraping process via Kubeadm
https://{{ first_kube_master }}:{{ kube_apiserver_port }}
{%- endif %}
See ### <--
comments in the above snippet. Also, as it doesn't make sense to override this variable I simply hardcoded kubeadm_discovery_address for the first Kubespray pass.
TL;DR fresh (future) masters are trying to connect to their IP addresses instead of contacting first master node.
@D3DeFi Thank you for your report. Did you use cluster.yml or scale.yml when facing this issue to add a master node?
@D3DeFi Thank you for your report.
Did you use cluster.yml or scale.yml when facing this issue to add a master node?
cluster.yml, my nodes are untainted masters with etcd.
This is not the nicest solution, but after doing manual kubeadm reset
on nodes that got incorrectly provisioned, with the following patches, I was able to add nodes to the cluster without problem:
diff --git a/roles/kubernetes/control-plane/tasks/kubeadm-fix-apiserver.yml b/roles/kubernetes/control-plane/tasks/kubeadm-fix-apiserver.yml
index 5376aba8..79a5861b 100644
--- a/roles/kubernetes/control-plane/tasks/kubeadm-fix-apiserver.yml
+++ b/roles/kubernetes/control-plane/tasks/kubeadm-fix-apiserver.yml
@@ -11,6 +11,22 @@
- controller-manager.conf
- kubelet.conf
- scheduler.conf
+
+ notify:
+ - "Master | Restart kube-controller-manager"
+ - "Master | Restart kube-scheduler"
+ - "Master | reload kubelet"
+
+- name: Update bind address in component manifests
+ replace:
+ dest: "{{ kube_config_dir }}/manifests/{{ item }}"
+ regexp: "^(.*){{ kubeadm_discovery_address.split(':')[0] }}(.*)$"
+ replace: "\\g<1>{{ kube_apiserver_bind_address }}\\g<2>"
+ with_items:
+ - kube-apiserver.yaml
+ - kube-controller-manager.yaml
+ - kube-scheduler.yaml
+
notify:
- "Master | Restart kube-controller-manager"
- "Master | Restart kube-scheduler"
diff --git a/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml b/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml
index 1af7f0c6..4a7039f1 100644
--- a/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml
+++ b/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml
@@ -2,7 +2,7 @@
- name: Set kubeadm_discovery_address
set_fact:
kubeadm_discovery_address: >-
- {%- if "127.0.0.1" in kube_apiserver_endpoint or "localhost" in kube_apiserver_endpoint -%}
+ {%- if "127.0.0.1" in kube_apiserver_endpoint or "localhost" in kube_apiserver_endpoint or kube_apiserver_bind_address != "0.0.0.0" -%}
{{ first_kube_master }}:{{ kube_apiserver_port }}
{%- else -%}
{{ kube_apiserver_endpoint | regex_replace('https://', '') }}
diff --git a/roles/kubernetes/control-plane/tasks/main.yml b/roles/kubernetes/control-plane/tasks/main.yml
index a073b5de..d35213ca 100644
--- a/roles/kubernetes/control-plane/tasks/main.yml
+++ b/roles/kubernetes/control-plane/tasks/main.yml
@@ -96,3 +96,5 @@
state: started
daemon-reload: "{{ k8s_certs_units is changed }}"
when: auto_renew_certificates
+
+- meta: flush_handlers
Hi @oomichi
We faced similar issue again - can't setup cluster when kube_apiserver_bind_address != "0.0.0.0"
with more than one master:
I don't see these changes: Why was the issue close?
--- a/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml
+++ b/roles/kubernetes/control-plane/tasks/kubeadm-secondary.yml
@@ -2,7 +2,7 @@
- name: Set kubeadm_discovery_address
set_fact:
kubeadm_discovery_address: >-
- {%- if "127.0.0.1" in kube_apiserver_endpoint or "localhost" in kube_apiserver_endpoint -%}
+ {%- if "127.0.0.1" in kube_apiserver_endpoint or "localhost" in kube_apiserver_endpoint or kube_apiserver_bind_address != "0.0.0.0" -%}
{{ first_kube_master }}:{{ kube_apiserver_port }}
{%- else -%}
{{ kube_apiserver_endpoint | regex_replace('https://', '') }}
@Bledai it was automatically close by #7989 as @Alvaro-Campesino tag his PR as a fix for this issue
@floryut so the problem still exists or maybe this is in the new one but the same root cause. I'm testing with
+ {%- if "127.0.0.1" in kube_apiserver_endpoint or "localhost" in kube_apiserver_endpoint or kube_apiserver_bind_address != "0.0.0.0" -%}
and current kubespray from master (aa9ad1ed6094cf952dfb8e3320a5c360d2da99b6) if it is helps I'll create PR
If you want to create a PR for a fix on an issue, that's always welcome and a pleasure.
Maybe loop in @Alvaro-Campesino
Edit: Reopening in the meantime, while we assess the situation
@floryut @Alvaro-Campesino added
{%- if "127.0.0.1" in kube_apiserver_endpoint or "localhost" in kube_apiserver_endpoint or kube_apiserver_bind_address != "0.0.0.0" -%}
I set kube_apiserver_bind_address
in inventory 3 different ip
but as a result on manifests/kube-apiserver.yaml
on each master the bind-adress
is kube_apiserver_bind_address
from first master
advertise-address
- different and ok for each master
Does any idea why?
What the difference between
kube_apiserver_endpoint
and
kubeadm_discovery_address
?
Can you paste your inventory data?
kube_apiserver_bind_address
must be an IP address, not an array containing three different ip addresses
@Alvaro-Campesino I know: kube_apiserver_bind_address
the same as ip
in my inventory
as I understand this is the reason for the changes what was proposed by @D3DeFi
+ notify:
+ - "Master | Restart kube-controller-manager"
+ - "Master | Restart kube-scheduler"
+ - "Master | reload kubelet"
+
+- name: Update bind address in component manifests
+ replace:
+ dest: "{{ kube_config_dir }}/manifests/{{ item }}"
+ regexp: "^(.*){{ kubeadm_discovery_address.split(':')[0] }}(.*)$"
+ replace: "\\g<1>{{ kube_apiserver_bind_address }}\\g<2>"
+ with_items:
+ - kube-apiserver.yaml
+ - kube-controller-manager.yaml
+ - kube-scheduler.yaml
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.