kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

Changing loadbalancer FQDN for apiserver

Open ccaillet1974 opened this issue 2 years ago • 5 comments

Hi all,

I've deployed a k8s cluster with kubespray v2.18.1 with 3 masters and 4 workers. All is working well BUT the variable apiserver_loadbalancer_domain_name has not been fulfilled on the first deployment so loadbalancer have the default fqdn wihch is lb-apiserver.kubernetes.local.

I want to change this FQDN but I do not find the correct method for.

Any help would be appreciated.

ccaillet1974 avatar Jun 22 '22 12:06 ccaillet1974

@ccaillet1974 the supported method to change the certificate domain name for kubernetes apiserver is indeed to set apiserver_loadbalancer_domain_name so please share more details about your setup and how you have set your ansible inventory variables which may explain why it did not propagate correcly.

cristicalin avatar Jun 22 '22 19:06 cristicalin

@cristicalin I'll give you more details :)

I've tried many tests for changing this FQDN and till now no has been worked.

My inventory file is the following one (I don't hide IP because it's a test environment and all IPs are private on non routed subnet without gateway so unable for anyone to access those IPs from internet :) ) :

all:
  hosts:
    lyo0-k8s-testm00:
      ansible_host: 10.128.10.64
      ip: 10.141.10.64
      access_ip: 10.144.10.64
    lyo0-k8s-testm01:
      ansible_host: 10.128.10.65
      ip: 10.141.10.65
      access_ip: 10.144.10.65
    lyo0-k8s-testm02:
      ansible_host: 10.128.10.66
      ip: 10.141.10.66
      access_ip: 10.144.10.66
    lyo0-k8s-testw00:
      ansible_host: 10.128.10.70
      ip: 10.141.10.70
      access_ip: 10.144.10.70
    lyo0-k8s-testw01:
      ansible_host: 10.128.10.71
      ip: 10.141.10.71
      access_ip: 10.144.10.71
    lyo0-k8s-testw02:
      ansible_host: 10.128.10.72
      ip: 10.141.10.72
      access_ip: 10.144.10.72
    lyo0-k8s-testw03:
      ansible_host: 10.128.10.73
      ip: 10.141.10.73
      access_ip: 10.144.10.73
  children:
    kube_control_plane:
      hosts:
        lyo0-k8s-testm00:
        lyo0-k8s-testm01:
        lyo0-k8s-testm02:
    kube_node:
      hosts:
        lyo0-k8s-testw00:
        lyo0-k8s-testw01:
        lyo0-k8s-testw02:
        lyo0-k8s-testw03:
    etcd:
      hosts:
        lyo0-k8s-testm00:
          etcd_address: 10.141.10.64
          etcd_access_address: 10.141.10.64
          etcd_metrics_port: 2381
        lyo0-k8s-testm01:
          etcd_address: 10.141.10.65
          etcd_access_address: 10.141.10.65
          etcd_metrics_port: 2381
        lyo0-k8s-testm02:
          etcd_address: 10.141.10.66
          etcd_access_address: 10.141.10.66
          etcd_metrics_port: 2381
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico-rr:
      hosts: {}

I've changed only the specified var aka apiserver_loadbalancer_domain_name. The cluster have been deployed with kubernetes v1.23.5 and cilium v1.10.7. I used containerd as runtime. All hosts are behind proxies for accessing external resources.

The error appears at different level but are always the x509 authentication failed because certs granted for all nodes (masters, workers and default FQDN lb-apiserver.kubernetes.local but not for new FQDN). If you want I could reproduced the different tests and pasting here the errors and the step they occured.

My tests for changing FQDN have been :

  1. Using upgrade-cluster.yml playbook with the var changed with v2.18.1 and with release-2.19 branch
  2. Using cluster.yml playbook with the changed var with v2.18.1 and with release-2.19 branch
  3. Using method described on posts : https://github.com/kubernetes-sigs/kubespray/issues/5464 used for renew certs with v2.18.1

In the meantime I've seen a strange behaviour with crictl and nerdctl with the version installed by the v2.18.1 my containers on a master for example appeared with crictl ps but not with nerdctl ps, I've other clusters which have been deployed with the right FQDN and upgraded with release-2.19 to 1.23.7 for kubernetes and cilium 1.11.3 without any problem and this behaviour doesn't occur.

My target is to upgrade from 1.23.5 to 1.24.x with release-2.19 branch and of course changing the APISERVER FQDN :)

PS : sorry for my english ... or my french style :)

Christophe

ccaillet1974 avatar Jun 23 '22 07:06 ccaillet1974

@ccaillet1974 I don't see where in you inventory vars or group vars you have applied apiserver_loadbalancer_domain_name note that it should be applied either on the all group, the k8s_cluster group or the kube_control_plane group.

cristicalin avatar Jun 23 '22 14:06 cristicalin

This var is defined on inventory/<inventory_name>/group_vars/all/all.yml

Do you need an extract of this file ?

ccaillet1974 avatar Jun 23 '22 15:06 ccaillet1974

Hi all,

@cristicalin : I don't uderstand where I need to declare the apiserver_loadbalancer_domain_name var in inventory file. Actually this var is only defined on file : inventory/<inventory_name>/group_vars/all/all.yml and nowhere else or I miss something.

Could you please help me on this ?

What kind of element you need for help me to fix this behaviour ?

Thanks by advance for your reply

Christophe

ccaillet1974 avatar Aug 10 '22 08:08 ccaillet1974

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 08 '22 08:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 08 '22 08:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 07 '23 09:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 07 '23 09:01 k8s-ci-robot

I had the same issue and had to change the host name in the kubconfig file on the master node to the new value of apiserver_loadbalancer_domain_name to get it working. The old host name is no longer valid because it is replaced in the kube api server certifcate.

File: /etc/kubernetes/admin.conf Field: clusters.cluster.server

seumasdunlop avatar Jul 26 '23 22:07 seumasdunlop