龙蜥anolis8.9 使用kube-vip模式安装失败
What is version of KubeKey has the issue?
kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.7", GitCommit:"da475c670813fc8a4dd3b1312aaa36e96ff01a1f", GitTreeState:"clean", BuildDate:"2024-10-30T09:41:20Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
What is your os environment?
龙蜥 anolis 8.9
KubeKey config file
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: qianmo
spec:
hosts:
##You should complete the ssh information of the hosts
- {name: node1, address: 192.168.154.189, password: "rootadmin"}
- {name: node2, address: 192.168.154.188, password: "rootadmin"}
- {name: node3, address: 192.168.154.187, password: "rootadmin"}
roleGroups:
etcd:
- node1
master:
- node1
worker:
- node[1:3]
controlPlaneEndpoint:
##Internal loadbalancer for apiservers
internalLoadbalancer: kube-vip
externalDNS: false
address: "192.168.154.186"
port: 6443
system:
ntpServers:
- time1.cloud.tencent.com
- ntp.aliyun.com
- node1 # Set the node name in `hosts` as ntp server if no public ntp servers access.
timezone: "Asia/Shanghai"
kubernetes:
version: v1.31.2
containerManager: containerd
clusterName: cluster.local
apiserverArgs:
- service-node-port-range=80-65535
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
registry:
privateRegistry: ""
registryMirrors: ['http://192.168.154.128:8081']
insecureRegistries:
- http://192.168.154.128:8081
A clear and concise description of what happend.
使用 internalLoadbalancer: kube-vip 安装失败,使用haproxy安装成功
Relevant log output
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.002366938s
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is not healthy after 4m0.001034363s
Unfortunately, an error has occurred:
context deadline exceeded
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: could not initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
15:13:46 CST stdout: [node1]
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1206 15:13:46.410906 20266 reset.go:123] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get "https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 192.168.154.186:6443: connect: no route to host
[preflight] Running pre-flight checks
W1206 15:13:46.410973 20266 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
Additional information
No response
我这边测试可以,没有问题,
基础环境
[root@master2 ~]# uname -a
Linux master2 5.10.134-17.2.an8.x86_64 #1 SMP Fri Aug 9 15:52:23 CST 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@master2 ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.9"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.9"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.9"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
[root@master2 ~]#
[root@master2 ~]# kk version
kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.7", GitCommit:"da475c670813fc8a4dd3b1312aaa36e96ff01a1f", GitTreeState:"clean", BuildDate:"2024-10-30T09:41:20Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
配置文件
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: master1, address: 192.168.5.124, internalAddress: 192.168.5.124, user: root, privateKeyPath: "~/.ssh/id_rsa"}
- {name: master2, address: 192.168.5.125, internalAddress: 192.168.5.125, user: root, privateKeyPath: "~/.ssh/id_rsa"}
- {name: master3, address: 192.168.5.127, internalAddress: 192.168.5.127, user: root, privateKeyPath: "~/.ssh/id_rsa"}
roleGroups:
etcd:
- master[1:3]
control-plane:
- master[1:3]
worker:
- master[1:3]
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
internalLoadbalancer: kube-vip
externalDNS: false
domain: lb.kubesphere.local
address: "192.168.5.123"
port: 6443
system:
# The ntp servers of chrony.
ntpServers:
- ntp.aliyun.com
- master1 # Set the node name in `hosts` as ntp server if no public ntp servers access.
timezone: "Asia/Shanghai"
kubernetes:
version: v1.26.15
clusterName: cluster.local
autoRenewCerts: true
containerManager: containerd
apiserverArgs:
- service-node-port-range=10000-65535
# maxPods is the number of Pods that can run on this Kubelet. [Default: 110]
maxPods: 110
# Specify which proxy mode to use. [Default: ipvs]
proxyMode: ipvs
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []
效果
[root@master2 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master1 Ready control-plane,worker 15h v1.26.15 192.168.5.124 <none> Anolis OS 8.9 5.10.134-16.2.an8.x86_64 containerd://1.7.13
master2 Ready control-plane,worker 15h v1.26.15 192.168.5.125 <none> Anolis OS 8.9 5.10.134-17.2.an8.x86_64 containerd://1.7.13
master3 Ready control-plane,worker 15h v1.26.15 192.168.5.127 <none> Anolis OS 8.9 5.10.134-17.2.an8.x86_64 containerd://1.7.13
[root@master2 ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-57db949bd8-6gbdf 1/1 Running 0 15h
kube-system calico-node-47kxm 1/1 Running 0 15h
kube-system calico-node-6gf9c 1/1 Running 0 15h
kube-system calico-node-g2q68 1/1 Running 1 (3m10s ago) 15h
kube-system coredns-5b486d6f8b-jf9zv 1/1 Running 0 15h
kube-system coredns-5b486d6f8b-z85ct 1/1 Running 0 15h
kube-system kube-apiserver-master1 1/1 Running 0 15h
kube-system kube-apiserver-master2 1/1 Running 0 15h
kube-system kube-apiserver-master3 1/1 Running 1 (3m10s ago) 15h
kube-system kube-controller-manager-master1 1/1 Running 0 15h
kube-system kube-controller-manager-master2 1/1 Running 0 15h
kube-system kube-controller-manager-master3 1/1 Running 1 (3m10s ago) 15h
kube-system kube-proxy-cpfcr 1/1 Running 0 15h
kube-system kube-proxy-m6x9r 1/1 Running 1 (3m10s ago) 15h
kube-system kube-proxy-ng7vc 1/1 Running 0 15h
kube-system kube-scheduler-master1 1/1 Running 0 15h
kube-system kube-scheduler-master2 1/1 Running 0 15h
kube-system kube-scheduler-master3 1/1 Running 1 (3m10s ago) 15h
kube-system kube-vip-master1 1/1 Running 3 (15h ago) 15h
kube-system kube-vip-master2 1/1 Running 3 (14h ago) 15h
kube-system kube-vip-master3 1/1 Running 1 (3m10s ago) 15h
kube-system nodelocaldns-4j5kv 1/1 Running 0 15h
kube-system nodelocaldns-v25hm 1/1 Running 0 15h
kube-system nodelocaldns-xhffm 1/1 Running 2 (3m10s ago) 15h
[root@master2 ~]#
[root@master1 ~]# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp6s18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether bc:24:11:b2:e0:0e brd ff:ff:ff:ff:ff:ff
inet 192.168.5.124/24 brd 192.168.5.255 scope global noprefixroute enp6s18
valid_lft forever preferred_lft forever
inet 192.168.5.123/32 scope global enp6s18
valid_lft forever preferred_lft forever
inet6 fe80::be24:11ff:feb2:e00e/64 scope link noprefixroute
valid_lft forever preferred_lft forever
我看了下kube-vip 上的问题,应该是在k8s v1.28.x及以下 版本没有问题,从v1.29.x开始就不行了
我看了下kube-vip 上的问题,应该是在k8s v1.28.x及以下 版本没有问题,从v1.29.x开始就不行了
好的 感谢
https://github.com/kubernetes/kubeadm/issues/2414 kubeadm 1.29中admin.conf文件权限受限了。而kube-vip还是再使用admin.conf这个文件,导致kube-vip获取lease资源报错 这里有个解决方案:https://github.com/kube-vip/kube-vip/issues/684#issuecomment-1864855405
一样的问题#2375
作者能修复一下么?
请问是解决了么
先看kube-vip社区有什么好的解决方案。 没有的话。后续就把https://github.com/kube-vip/kube-vip/issues/684#issuecomment-1864855405 这个方案加到4.x版本里面去