kubekey icon indicating copy to clipboard operation
kubekey copied to clipboard

DNS处理不正确

Open skyhhjmk opened this issue 8 months ago • 3 comments

What is version of KubeKey has the issue?

kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.8", GitCommit:"dbb1ee4aa1ecf0586565ff3374427d8a7d9b327b", GitTreeState:"clean", BuildDate:"2025-03-26T04:49:07Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

What is your os environment?

Ubuntu 22.04

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: xxx
spec:
  hosts:
  - {name: node1, address: 10.111.0.1, internalAddress: 10.111.0.1, privateKeyPath: "/root/pri-key"}
  - {name: node2, address: 10.111.0.2, internalAddress: 10.111.0.2, privateKeyPath: "/root/pri-key"}
  # - {name: node3, address: 10.111.0.3, internalAddress: 10.111.0.3, privateKeyPath: "/root/pri-key"}
  roleGroups:
    etcd:
    - node1
    control-plane: 
    - node1
    worker:
    - node1
    - node2
    # - node3
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  system:
    ntpServers:
      - time1.cloud.tencent.com
      - ntp.aliyun.com
    timezone: "Asia/Shanghai"
  kubernetes:
    version: v1.28.15
    clusterName: xxx.com
    autoRenewCerts: true
    containerManager: containerd
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: ""
    auths:
      "reg.xxx.com":
        username: "xxx"
        password: "xxx"
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: []
  addons: []

A clear and concise description of what happend.

在使用kk安装完成k8s集群后,不进行任何操作,等待所有pod就绪并且稳定运行一段时间后使用reboot命令重启服务器,重启后无法正常使用kubectl get nodes等命令获取集群信息,排查问题为DNS错误,没有将lb.kubesphere.local写入hosts或没有配置正确的本地DNS,如果修改kube config配置文件,将域名改为控制平面节点的IP地址,那么可以正常获取到nodes信息

Relevant log output

root@node1:~# kubectl get nodes
Unable to connect to the server: dial tcp: lookup lb.kubesphere.local on 127.0.0.53:53: server misbehaving
root@node1:~# nslookup lb.kubesphere.local
;; Got SERVFAIL reply from 127.0.0.53
Server:         127.0.0.53
Address:        127.0.0.53#53

** server can't find lb.kubesphere.local: SERVFAIL

root@node1:~# dig lb.kubesphere.local

; <<>> DiG 9.18.30-0ubuntu0.22.04.2-Ubuntu <<>> lb.kubesphere.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 42177
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;lb.kubesphere.local.           IN      A

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Wed Apr 23 17:47:59 CST 2025
;; MSG SIZE  rcvd: 48

root@node1:~# cat /etc/hosts
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1 kube-node1 kube-node1
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Additional information

No response

skyhhjmk avatar Apr 23 '25 09:04 skyhhjmk

我再次尝试重新安装,但是这次我使用了我已注册的域名,并且在cloudflare上添加了解析,当我安装完成集群并且查看hosts文件时,我发现内容是这样的:

# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
#     /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1      node1
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

# kubekey hosts BEGIN
10.111.0.1  node1.xxx.com node1
10.111.0.2  node2.xxx.com node2
10.111.0.1  lb.kube.xxx.com
# kubekey hosts END

也就是说,hosts文件在重启后被还原了,我不清楚这是否和kubekey有关,但我认为kubekey应该做些措施来防止hosts文件中的地址被修改

skyhhjmk avatar Apr 23 '25 11:04 skyhhjmk

我先前没有注意到hosts文件的头部注释内容,我检查时理解了含义,明白了是由于cloud init引起的,cloud init会在重启时自动配置hosts文件,但是会覆盖掉原有配置,所以我建议kubekey检测到存在cloud init时在修改hosts的同时也要修改/etc/cloud/templates/hosts.debian.tmpl或者/etc/cloud/cloud.cfg

skyhhjmk avatar Apr 23 '25 11:04 skyhhjmk

好想法,后续考虑把目标的/etc/hosts文件做成可配置的变量

redscholar avatar Apr 24 '25 10:04 redscholar