DNS处理不正确
What is version of KubeKey has the issue?
kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.8", GitCommit:"dbb1ee4aa1ecf0586565ff3374427d8a7d9b327b", GitTreeState:"clean", BuildDate:"2025-03-26T04:49:07Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
What is your os environment?
Ubuntu 22.04
KubeKey config file
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: xxx
spec:
hosts:
- {name: node1, address: 10.111.0.1, internalAddress: 10.111.0.1, privateKeyPath: "/root/pri-key"}
- {name: node2, address: 10.111.0.2, internalAddress: 10.111.0.2, privateKeyPath: "/root/pri-key"}
# - {name: node3, address: 10.111.0.3, internalAddress: 10.111.0.3, privateKeyPath: "/root/pri-key"}
roleGroups:
etcd:
- node1
control-plane:
- node1
worker:
- node1
- node2
# - node3
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: ""
port: 6443
system:
ntpServers:
- time1.cloud.tencent.com
- ntp.aliyun.com
timezone: "Asia/Shanghai"
kubernetes:
version: v1.28.15
clusterName: xxx.com
autoRenewCerts: true
containerManager: containerd
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
privateRegistry: ""
auths:
"reg.xxx.com":
username: "xxx"
password: "xxx"
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []
A clear and concise description of what happend.
在使用kk安装完成k8s集群后,不进行任何操作,等待所有pod就绪并且稳定运行一段时间后使用reboot命令重启服务器,重启后无法正常使用kubectl get nodes等命令获取集群信息,排查问题为DNS错误,没有将lb.kubesphere.local写入hosts或没有配置正确的本地DNS,如果修改kube config配置文件,将域名改为控制平面节点的IP地址,那么可以正常获取到nodes信息
Relevant log output
root@node1:~# kubectl get nodes
Unable to connect to the server: dial tcp: lookup lb.kubesphere.local on 127.0.0.53:53: server misbehaving
root@node1:~# nslookup lb.kubesphere.local
;; Got SERVFAIL reply from 127.0.0.53
Server: 127.0.0.53
Address: 127.0.0.53#53
** server can't find lb.kubesphere.local: SERVFAIL
root@node1:~# dig lb.kubesphere.local
; <<>> DiG 9.18.30-0ubuntu0.22.04.2-Ubuntu <<>> lb.kubesphere.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 42177
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;lb.kubesphere.local. IN A
;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Wed Apr 23 17:47:59 CST 2025
;; MSG SIZE rcvd: 48
root@node1:~# cat /etc/hosts
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
# /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1 kube-node1 kube-node1
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Additional information
No response
我再次尝试重新安装,但是这次我使用了我已注册的域名,并且在cloudflare上添加了解析,当我安装完成集群并且查看hosts文件时,我发现内容是这样的:
# Your system has configured 'manage_etc_hosts' as True.
# As a result, if you wish for changes to this file to persist
# then you will need to either
# a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl
# b.) change or remove the value of 'manage_etc_hosts' in
# /etc/cloud/cloud.cfg or cloud-config from user-data
#
127.0.1.1 node1
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# kubekey hosts BEGIN
10.111.0.1 node1.xxx.com node1
10.111.0.2 node2.xxx.com node2
10.111.0.1 lb.kube.xxx.com
# kubekey hosts END
也就是说,hosts文件在重启后被还原了,我不清楚这是否和kubekey有关,但我认为kubekey应该做些措施来防止hosts文件中的地址被修改
我先前没有注意到hosts文件的头部注释内容,我检查时理解了含义,明白了是由于cloud init引起的,cloud init会在重启时自动配置hosts文件,但是会覆盖掉原有配置,所以我建议kubekey检测到存在cloud init时在修改hosts的同时也要修改/etc/cloud/templates/hosts.debian.tmpl或者/etc/cloud/cloud.cfg
好想法,后续考虑把目标的/etc/hosts文件做成可配置的变量