[求助/Help]手动添加控制节点命令执行卡住
您好 请教一下手动添加控制节点卡住一小时以上没有报错 ocadm version ocadm version: version.Info{Major:"0", Minor:"0", GitVersion:"v3.11.3-20240423.1", GitBranch:"tags/v3.11.3-20240423.1", GitCommit:"f8d30d14", GitTreeState:"clean", BuildDate:"2024-04-23T10:57:25Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
ocadm join 命令加上 -v 10 参数,具体看下报错,另外可以确认下两个节点的系统时间是否一致。
日志如下 时间调整成一致的了
I1011 15:53:38.785817 49176 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName I1011 15:53:38.785896 49176 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I1011 15:53:38.785944 49176 preflight.go:91] [preflight] Running general checks I1011 15:53:38.785965 49176 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests I1011 15:53:38.785986 49176 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf I1011 15:53:38.785992 49176 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I1011 15:53:38.785998 49176 checks.go:105] validating the container runtime I1011 15:53:38.802360 49176 checks.go:131] validating if the service is enabled and active I1011 15:53:38.829150 49176 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I1011 15:53:38.829183 49176 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward I1011 15:53:38.829196 49176 checks.go:653] validating whether swap is enabled or not I1011 15:53:38.829216 49176 checks.go:382] validating the presence of executable ip I1011 15:53:38.829231 49176 checks.go:382] validating the presence of executable iptables I1011 15:53:38.829242 49176 checks.go:382] validating the presence of executable mount I1011 15:53:38.829251 49176 checks.go:382] validating the presence of executable nsenter I1011 15:53:38.829260 49176 checks.go:382] validating the presence of executable ebtables I1011 15:53:38.829270 49176 checks.go:382] validating the presence of executable ethtool I1011 15:53:38.829279 49176 checks.go:382] validating the presence of executable socat I1011 15:53:38.829288 49176 checks.go:382] validating the presence of executable tc [WARNING FileExisting-tc]: tc not found in system path I1011 15:53:38.829320 49176 checks.go:382] validating the presence of executable touch I1011 15:53:38.829331 49176 checks.go:524] running all checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.24. Latest validated version: 18.09 I1011 15:53:38.839290 49176 checks.go:412] checking whether the given node name is reachable using net.LookupHost I1011 15:53:38.839411 49176 checks.go:622] validating kubelet version I1011 15:53:38.888095 49176 checks.go:131] validating if the service is enabled and active I1011 15:53:38.896103 49176 checks.go:209] validating availability of port 10250 I1011 15:53:38.896172 49176 checks.go:439] validating if the connectivity type is via proxy or direct I1011 15:53:38.896185 49176 join.go:460] [preflight] Discovering cluster-info I1011 15:53:38.896211 49176 token.go:199] [discovery] Trying to connect to API Server "10.64.25.150:6443" I1011 15:53:38.896486 49176 token.go:74] [discovery] Created cluster-info discovery client, requesting info from "https://10.64.25.150:6443" I1011 15:53:38.896517 49176 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: ocadm/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info' I1011 15:53:38.896763 49176 round_trippers.go:438] GET https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info in 0 milliseconds I1011 15:53:38.896775 49176 round_trippers.go:444] Response Headers: I1011 15:53:38.896799 49176 token.go:82] [discovery] Failed to request cluster info, will try again: [Get "https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp 10.64.25.150:6443: connect: connection refused] I1011 15:53:43.897594 49176 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: ocadm/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info' I1011 15:53:43.897874 49176 round_trippers.go:438] GET https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info in 0 milliseconds I1011 15:53:43.897884 49176 round_trippers.go:444] Response Headers: I1011 15:53:43.897904 49176 token.go:82] [discovery] Failed to request cluster info, will try again: [Get "https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp 10.64.25.150:6443: connect: connection refused]
dial tcp 10.64.25.150:6443: connect: connection refused
网络不通?
ping 10.64.25.150
PING 10.64.25.150 (10.64.25.150) 56(84) bytes of data. 64 bytes from 10.64.25.150: icmp_seq=1 ttl=64 time=0.166 ms 64 bytes from 10.64.25.150: icmp_seq=2 ttl=64 time=0.121 ms 64 bytes from 10.64.25.150: icmp_seq=3 ttl=64 time=0.115 ms --- 10.64.25.150 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2043ms rtt min/avg/max/mdev = 0.115/0.134/0.166/0.022 ms telnet: connect to address 10.64.25.150: Connection refused
telnet 10.64.25.150 6443
Trying 10.64.25.150... telnet: connect to address 10.64.25.150: Connection refused
现在网络通了,但是添加控制节点报错,提示要访问该机器的etcd的端口,为啥不是访问集群的etcd
ocadm join --control-plane 10.64.25.150:6443 --token uzcxd0.qt3csimnlx2emc13 --certificate-key 3a8039ff2670ba91fff0d598e16a4a60b86cb0a789ff6748e7f49ea4e9b5f6b1 --discovery-token-unsafe-skip-ca-verification --apiserver-advertise-address 10.64.25.96 --node-ip 10.64.25.96 --as-onecloud-controller --host-networks 'bond0/br0/10.64.25.96' --high-availability-vip 10.64.25.150 --keepalived-version-tag v2.0.25 --ignore-preflight-errors=all [preflight] Running pre-flight checks [WARNING FileExisting-tc]: tc not found in system path [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.24. Latest validated version: 18.09 [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm ocadm-config -oyaml' [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' W1012 15:01:12.023894 52795 proxier.go:513] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' got Keepalived version tag from commandline: v2.0.25 [PASS] Installing Keepalived:v2.0.25 as BACKUP, nodeIP[10.64.25.96], interface: bond0[PASS] keepalived path created. 106425150 [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [sym206-cpu-b1211-node096 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.64.25.96 10.64.25.150] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy {"level":"warn","ts":"2024-10-12T15:01:18.377+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.64.25.96:2379/","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing dial tcp 10.64.25.96:2379: connect: connection refused""} error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded
现在网络通了,但是添加控制节点报错,提示要访问该机器的etcd的端口,为啥不是访问集群的etcd
ocadm join --control-plane 10.64.25.150:6443 --token uzcxd0.qt3csimnlx2emc13 --certificate-key 3a8039ff2670ba91fff0d598e16a4a60b86cb0a789ff6748e7f49ea4e9b5f6b1 --discovery-token-unsafe-skip-ca-verification --apiserver-advertise-address 10.64.25.96 --node-ip 10.64.25.96 --as-onecloud-controller --host-networks 'bond0/br0/10.64.25.96' --high-availability-vip 10.64.25.150 --keepalived-version-tag v2.0.25 --ignore-preflight-errors=all [preflight] Running pre-flight checks [WARNING FileExisting-tc]: tc not found in system path [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.24. Latest validated version: 18.09 [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm ocadm-config -oyaml' [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' W1012 15:01:12.023894 52795 proxier.go:513] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' got Keepalived version tag from commandline: v2.0.25 [PASS] Installing Keepalived:v2.0.25 as BACKUP, nodeIP[10.64.25.96], interface: bond0[PASS] keepalived path created. 106425150 [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [sym206-cpu-b1211-node096 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.64.25.96 10.64.25.150] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy {"level":"warn","ts":"2024-10-12T15:01:18.377+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.64.25.96:2379/","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing dial tcp 10.64.25.96:2379: connect: connection refused""} error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded
加的是控制节点,所以要访问 etcd 的 2379 端口
按照官方文档操作的出现了这个报错 是缺什么操作吗还是
@ChaoHsin-fang 应该还是 etcd 的端口访问不通导致的
kubectl get pods -A -o wide | grep etcd
kube-system etcd-node095 1/1 Running 10 120d 10.64.25.95 node095
目前部署方式 etcd 和控制节点服务是一起部署的,不支持分开单独部署
If you do not provide feedback for more than 37 days, we will close the issue and you can either reopen it or submit a new issue.
您超过 37 天未反馈信息,我们将关闭该 issue,如有需求您可以重新打开或者提交新的 issue。