kubekey icon indicating copy to clipboard operation
kubekey copied to clipboard

龙蜥anolis8.9 使用kube-vip模式安装失败

Open emf1002 opened this issue 1 year ago • 8 comments

What is version of KubeKey has the issue?

kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.7", GitCommit:"da475c670813fc8a4dd3b1312aaa36e96ff01a1f", GitTreeState:"clean", BuildDate:"2024-10-30T09:41:20Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

What is your os environment?

龙蜥 anolis 8.9

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: qianmo
spec:
  hosts: 
  ##You should complete the ssh information of the hosts
  - {name: node1, address: 192.168.154.189, password: "rootadmin"}
  - {name: node2, address: 192.168.154.188, password: "rootadmin"}
  - {name: node3, address: 192.168.154.187, password: "rootadmin"}
  roleGroups:
    etcd:
    - node1
    master:
    - node1
    worker:
    - node[1:3]
  controlPlaneEndpoint:
    ##Internal loadbalancer for apiservers
    internalLoadbalancer: kube-vip
    externalDNS: false
    address: "192.168.154.186"
    port: 6443
  system:
    ntpServers:
      - time1.cloud.tencent.com
      - ntp.aliyun.com
      - node1 # Set the node name in `hosts` as ntp server if no public ntp servers access.
    timezone: "Asia/Shanghai"
  kubernetes:
    version: v1.31.2
    containerManager: containerd
    clusterName: cluster.local
    apiserverArgs:
    - service-node-port-range=80-65535
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
  registry:
    privateRegistry: ""
    registryMirrors: ['http://192.168.154.128:8081']
    insecureRegistries:
    - http://192.168.154.128:8081

A clear and concise description of what happend.

使用 internalLoadbalancer: kube-vip 安装失败,使用haproxy安装成功

Relevant log output

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 1.002366938s
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is not healthy after 4m0.001034363s

Unfortunately, an error has occurred:
        context deadline exceeded

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: could not initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
15:13:46 CST stdout: [node1]
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W1206 15:13:46.410906   20266 reset.go:123] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get "https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 192.168.154.186:6443: connect: no route to host
[preflight] Running pre-flight checks
W1206 15:13:46.410973   20266 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

Additional information

No response

emf1002 avatar Dec 06 '24 07:12 emf1002

我这边测试可以,没有问题,

基础环境

[root@master2 ~]# uname -a
Linux master2 5.10.134-17.2.an8.x86_64 #1 SMP Fri Aug 9 15:52:23 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

[root@master2 ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.9"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.9"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.9"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

[root@master2 ~]#
[root@master2 ~]# kk version
kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.7", GitCommit:"da475c670813fc8a4dd3b1312aaa36e96ff01a1f", GitTreeState:"clean", BuildDate:"2024-10-30T09:41:20Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

配置文件

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: master1, address: 192.168.5.124, internalAddress: 192.168.5.124, user: root, privateKeyPath: "~/.ssh/id_rsa"}
  - {name: master2, address: 192.168.5.125, internalAddress: 192.168.5.125, user: root, privateKeyPath: "~/.ssh/id_rsa"}
  - {name: master3, address: 192.168.5.127, internalAddress: 192.168.5.127, user: root, privateKeyPath: "~/.ssh/id_rsa"}
  roleGroups:
    etcd:
    - master[1:3]
    control-plane:
    - master[1:3]
    worker:
    - master[1:3]
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers
    internalLoadbalancer: kube-vip
    externalDNS: false
    domain: lb.kubesphere.local
    address: "192.168.5.123"
    port: 6443
  system:
    # The ntp servers of chrony.
    ntpServers:
      - ntp.aliyun.com
      - master1 # Set the node name in `hosts` as ntp server if no public ntp servers access.
    timezone: "Asia/Shanghai"
  kubernetes:
    version: v1.26.15
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: containerd
    apiserverArgs:
    - service-node-port-range=10000-65535
    # maxPods is the number of Pods that can run on this Kubelet. [Default: 110]
    maxPods: 110
    # Specify which proxy mode to use. [Default: ipvs]
    proxyMode: ipvs
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: ""
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: []
  addons: []

效果

[root@master2 ~]# kubectl get nodes -o wide
NAME      STATUS   ROLES                  AGE   VERSION    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE        KERNEL-VERSION             CONTAINER-RUNTIME
master1   Ready    control-plane,worker   15h   v1.26.15   192.168.5.124   <none>        Anolis OS 8.9   5.10.134-16.2.an8.x86_64   containerd://1.7.13
master2   Ready    control-plane,worker   15h   v1.26.15   192.168.5.125   <none>        Anolis OS 8.9   5.10.134-17.2.an8.x86_64   containerd://1.7.13
master3   Ready    control-plane,worker   15h   v1.26.15   192.168.5.127   <none>        Anolis OS 8.9   5.10.134-17.2.an8.x86_64   containerd://1.7.13

[root@master2 ~]# kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS        AGE
kube-system   calico-kube-controllers-57db949bd8-6gbdf   1/1     Running   0               15h
kube-system   calico-node-47kxm                          1/1     Running   0               15h
kube-system   calico-node-6gf9c                          1/1     Running   0               15h
kube-system   calico-node-g2q68                          1/1     Running   1 (3m10s ago)   15h
kube-system   coredns-5b486d6f8b-jf9zv                   1/1     Running   0               15h
kube-system   coredns-5b486d6f8b-z85ct                   1/1     Running   0               15h
kube-system   kube-apiserver-master1                     1/1     Running   0               15h
kube-system   kube-apiserver-master2                     1/1     Running   0               15h
kube-system   kube-apiserver-master3                     1/1     Running   1 (3m10s ago)   15h
kube-system   kube-controller-manager-master1            1/1     Running   0               15h
kube-system   kube-controller-manager-master2            1/1     Running   0               15h
kube-system   kube-controller-manager-master3            1/1     Running   1 (3m10s ago)   15h
kube-system   kube-proxy-cpfcr                           1/1     Running   0               15h
kube-system   kube-proxy-m6x9r                           1/1     Running   1 (3m10s ago)   15h
kube-system   kube-proxy-ng7vc                           1/1     Running   0               15h
kube-system   kube-scheduler-master1                     1/1     Running   0               15h
kube-system   kube-scheduler-master2                     1/1     Running   0               15h
kube-system   kube-scheduler-master3                     1/1     Running   1 (3m10s ago)   15h
kube-system   kube-vip-master1                           1/1     Running   3 (15h ago)     15h
kube-system   kube-vip-master2                           1/1     Running   3 (14h ago)     15h
kube-system   kube-vip-master3                           1/1     Running   1 (3m10s ago)   15h
kube-system   nodelocaldns-4j5kv                         1/1     Running   0               15h
kube-system   nodelocaldns-v25hm                         1/1     Running   0               15h
kube-system   nodelocaldns-xhffm                         1/1     Running   2 (3m10s ago)   15h
[root@master2 ~]#

[root@master1 ~]# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp6s18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether bc:24:11:b2:e0:0e brd ff:ff:ff:ff:ff:ff
    inet 192.168.5.124/24 brd 192.168.5.255 scope global noprefixroute enp6s18
       valid_lft forever preferred_lft forever
    inet 192.168.5.123/32 scope global enp6s18
       valid_lft forever preferred_lft forever
    inet6 fe80::be24:11ff:feb2:e00e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

我看了下kube-vip 上的问题,应该是在k8s v1.28.x及以下 版本没有问题,从v1.29.x开始就不行了

我看了下kube-vip 上的问题,应该是在k8s v1.28.x及以下 版本没有问题,从v1.29.x开始就不行了

好的 感谢

emf1002 avatar Dec 19 '24 01:12 emf1002

https://github.com/kubernetes/kubeadm/issues/2414 kubeadm 1.29中admin.conf文件权限受限了。而kube-vip还是再使用admin.conf这个文件,导致kube-vip获取lease资源报错 这里有个解决方案:https://github.com/kube-vip/kube-vip/issues/684#issuecomment-1864855405

redscholar avatar Dec 20 '24 08:12 redscholar

一样的问题#2375

graphenn avatar Dec 24 '24 05:12 graphenn

作者能修复一下么?

dailai avatar Feb 24 '25 03:02 dailai

请问是解决了么

graphenn avatar Mar 25 '25 07:03 graphenn

先看kube-vip社区有什么好的解决方案。 没有的话。后续就把https://github.com/kube-vip/kube-vip/issues/684#issuecomment-1864855405 这个方案加到4.x版本里面去

redscholar avatar Mar 25 '25 08:03 redscholar