使用命令:./kk add nodes -f ksp-v3013-v12310-offline.yaml报错lb.kubesphere.local:6443 was refused - did you specify the right host or port?
What is version of KubeKey has the issue?
kk version: &version.Info{Major:"3", Minor:"0", GitVersion:"v3.0.13", GitCommit:"ac75d3ef3c22e6a9d999dcea201234d6651b3e72", GitTreeState:"clean", BuildDate:"2023-10-30T11:15:14Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
What is your os environment?
centos7.9
问题描述
以下操作都是是用root账号在master1上执行的
使用命令:./kk add nodes -f ksp-v3013-v12310-offline.yaml,目的是添加一个worker9节点。 但是运行过程中出现错误:
15:40:34 CST [KubernetesStatusModule] Get kubernetes cluster status
15:40:34 CST stdout: [master1]
v1.23.10
15:40:34 CST stdout: [master1]
The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?
15:40:34 CST message: [master1]
get kubernetes cluster info failed: Failed to exec command: sudo -E /bin/bash -c “/usr/local/bin/kubectl –no-headers=true get nodes -o custom-columns=:metadata.name,:status.nodeInfo.kubeletVersion,:status.addresses”
The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?: Process exited with status 1
15:40:34 CST retry: [master1]
已做的检查如下:
1.检查了dns也没问题啊,手动执行“/usr/local/bin/kubectl –no-headers=true get nodes -o custom-columns=:metadata.name,:status.nodeInfo.kubeletVersion,:status.addresses”也能出结果。
2.curl -k https://127.0.0.1:6443/healthz
输出也是ok
3.节点、pods检查....都正常
[root@master1 offline]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 390d v1.23.10
master2 Ready control-plane,master 390d v1.23.10
master3 Ready control-plane,master 390d v1.23.10
worker1 Ready worker 390d v1.23.10
worker2 Ready worker 390d v1.23.10
worker3 Ready worker 390d v1.23.10
worker4 Ready worker 377d v1.23.10
worker5 Ready worker 377d v1.23.10
worker6 Ready <none> 377d v1.23.10
worker7 Ready worker 323d v1.23.10
worker8 Ready worker 226d v1.23.10
[root@master1 offline]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5568fcfc56-z7hs4 1/1 Running 1 (99m ago) 207d
calico-node-22lxz 1/1 Running 1 252d
calico-node-2xs7c 1/1 Running 2 (99m ago) 395d
calico-node-4qqjz 1/1 Running 1 395d
calico-node-6r25b 1/1 Running 0 230d
calico-node-krdfb 1/1 Running 1 252d
calico-node-lckhs 1/1 Running 1 252d
calico-node-nth8x 1/1 Running 1 252d
calico-node-p58tm 1/1 Running 1 395d
calico-node-qpkqd 1/1 Running 1 252d
calico-node-v2777 1/1 Running 1 252d
calico-node-zbbs6 1/1 Running 2 (5d6h ago) 252d
coredns-57b586455-8p9vj 1/1 Running 2 (99m ago) 395d
coredns-57b586455-kzgt5 1/1 Running 2 (99m ago) 395d
haproxy-worker1 1/1 Running 1 395d
haproxy-worker2 1/1 Running 5 (5d6h ago) 395d
haproxy-worker3 1/1 Running 1 395d
haproxy-worker4 1/1 Running 2 382d
haproxy-worker5 1/1 Running 2 382d
haproxy-worker6 1/1 Running 2 382d
haproxy-worker7 1/1 Running 1 328d
haproxy-worker8 1/1 Running 0 230d
kube-apiserver-master1 1/1 Running 23 (43m ago) 126m
kube-apiserver-master2 1/1 Running 2 (57d ago) 395d
kube-apiserver-master3 1/1 Running 2 (57d ago) 395d
kube-controller-manager-master1 1/1 Running 7 (99m ago) 395d
kube-controller-manager-master2 1/1 Running 3 (57d ago) 395d
kube-controller-manager-master3 1/1 Running 3 (57d ago) 395d
kube-proxy-42hdh 1/1 Running 1 252d
kube-proxy-45b7c 1/1 Running 1 252d
kube-proxy-d8jqc 1/1 Running 1 252d
kube-proxy-fqkkr 1/1 Running 1 395d
kube-proxy-kwg5d 1/1 Running 2 (5d6h ago) 252d
kube-proxy-nd2fg 1/1 Running 0 230d
kube-proxy-rj2vh 1/1 Running 1 252d
kube-proxy-rtcxm 1/1 Running 2 (99m ago) 395d
kube-proxy-vr9t5 1/1 Running 1 252d
kube-proxy-xbc2p 1/1 Running 1 395d
kube-proxy-zbck6 1/1 Running 1 252d
kube-scheduler-master1 1/1 Running 6 (99m ago) 395d
kube-scheduler-master2 1/1 Running 3 (57d ago) 395d
kube-scheduler-master3 1/1 Running 3 (57d ago) 395d
metrics-server-dcc48455d-fdkhw 1/1 Running 4 (80d ago) 395d
nodelocaldns-2gkjp 1/1 Running 1 382d
nodelocaldns-57dpv 1/1 Running 2 (99m ago) 395d
nodelocaldns-684bh 1/1 Running 1 395d
nodelocaldns-9blhv 1/1 Running 1 328d
nodelocaldns-b6vg2 1/1 Running 0 230d
nodelocaldns-br5jd 1/1 Running 1 395d
nodelocaldns-g2fsk 1/1 Running 2 382d
nodelocaldns-hjrnj 1/1 Running 2 (5d6h ago) 395d
nodelocaldns-mr7gx 1/1 Running 2 382d
nodelocaldns-qs6p9 1/1 Running 1 395d
nodelocaldns-tftpl 1/1 Running 1 395d
openebs-localpv-provisioner-754596f596-84cwq 1/1 Running 4 395d
snapshot-controller-0 1/1 Running 0 207d
4.kubelet状态存在错误
[root@master1 offline]# systemctl status -l kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2025-07-10 16:55:43 CST; 3 days ago
Docs: http://kubernetes.io/docs/
Main PID: 56968 (kubelet)
CGroup: /system.slice/kubelet.service
└─56968 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=10.8.3.236/kubesphereio/pause:3.6 --node-ip=10.8.43.1 --hostname-override=master1
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.951994 56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?resourceVersion=0&timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.953604 56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.955137 56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.956563 56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.957842 56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.957857 56968 kubelet_node_status.go:447] "Unable to update node status" err="update node status exceeds retry count"
7月 14 08:44:11 master1 kubelet[56968]: E0714 08:44:11.360885 56968 event.go:276] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-master1.1850d8a7aff27a33", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"171589920", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-apiserver-master1", UID:"f5b4d2cb38bdb33e484f494815280414", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"SandboxChanged", Message:"Pod sandbox changed, it will be killed and re-created.", Source:v1.EventSource{Component:"kubelet", Host:"master1"}, FirstTimestamp:time.Date(2025, time.July, 10, 17, 7, 49, 0, time.Local), LastTimestamp:time.Date(2025, time.July, 14, 8, 43, 47, 767110131, time.Local), Count:8, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://127.0.0.1:6443/api/v1/namespaces/kube-system/events/kube-apiserver-master1.1850d8a7aff27a33": dial tcp 127.0.0.1:6443: connect: connection refused'(may retry after sleeping)
7月 14 08:44:13 master1 kubelet[56968]: I0714 08:44:13.445530 56968 status_manager.go:664] "Failed to get status for pod" podUID=f5b4d2cb38bdb33e484f494815280414 pod="kube-system/kube-apiserver-master1" err="Get \"https://127.0.0.1:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-master1\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:15 master1 kubelet[56968]: I0714 08:44:15.444705 56968 scope.go:110] "RemoveContainer" containerID="1aba3f6984d2f56e6eb9748a50913c0e434d8d8e91b62b3d862a2b4c39cedf3f"
7月 14 08:44:17 master1 kubelet[56968]: E0714 08:44:17.916442 56968 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: unknown (get configmaps)
之前添加过3次工作节点,都是正常的,很快都能添加成功。但是这次添加节点失败的问题,实在找不到原因,求解决~
第一步中的检查,是在master1上吗。和执行kubekey命令的都是同一个用户吗。
是用root账号在master1上执行的
kubelet 报错,可能是你的docker服务也有问题。使用journalctl看看kubelet服务和docker服务的日志吧。
kubelet 报错,可能是你的docker服务也有问题。使用journalctl看看kubelet服务和docker服务的日志吧。
就很奇怪,我半年没动过,感觉他自己就出问题了。
kubelete日志:
[root@master1 offline]# systemctl status kubelet.service -l
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2025-07-10 16:55:43 CST; 4 days ago
Docs: http://kubernetes.io/docs/
Main PID: 56968 (kubelet)
CGroup: /system.slice/kubelet.service
└─56968 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=10.8.3.236/kubesphereio/pause:3.6 --node-ip=10.8.43.1 --hostname-override=master1
7月 15 10:13:21 master1 kubelet[56968]: I0715 10:13:21.565388 56968 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume \"k8s-certs\" (UniqueName: \"kubernetes.io/host-path/f5b4d2cb38bdb33e484f494815280414-k8s-certs\") pod \"kube-apiserver-master1\" (UID: \"f5b4d2cb38bdb33e484f494815280414\") " pod="kube-system/kube-apiserver-master1"
7月 15 10:13:21 master1 kubelet[56968]: I0715 10:13:21.565405 56968 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume \"ca-certs\" (UniqueName: \"kubernetes.io/host-path/f5b4d2cb38bdb33e484f494815280414-ca-certs\") pod \"kube-apiserver-master1\" (UID: \"f5b4d2cb38bdb33e484f494815280414\") " pod="kube-system/kube-apiserver-master1"
7月 15 10:13:21 master1 kubelet[56968]: I0715 10:13:21.565421 56968 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etcd-certs-0\" (UniqueName: \"kubernetes.io/host-path/f5b4d2cb38bdb33e484f494815280414-etcd-certs-0\") pod \"kube-apiserver-master1\" (UID: \"f5b4d2cb38bdb33e484f494815280414\") " pod="kube-system/kube-apiserver-master1"
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.734523 56968 reflector.go:138] object-"kubesphere-logging-system"/"fluent-bit-config": Failed to watch *v1.Secret: unknown (get secrets)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757814 56968 reflector.go:138] object-"kube-system"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757836 56968 reflector.go:138] object-"kube-system"/"nodelocaldns": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757909 56968 reflector.go:138] object-"kubesphere-logging-system"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757922 56968 reflector.go:138] object-"kubesphere-monitoring-system"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757938 56968 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757946 56968 reflector.go:138] object-"weave"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
docker日志:
[root@master1 ~]# journalctl -u docker
-- Logs begin at 四 2024-12-26 10:20:38 CST, end at 一 2025-07-14 16:20:53 CST. --
2月 21 09:24:21 master1 dockerd[4254]: time="2025-02-21T09:24:21.235192067+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 09:28:32 master1 dockerd[4254]: time="2025-02-21T09:28:32.883766286+08:00" level=warning msg="error aborting content ingest" digest="sha256:49b31365e2747d3a5f0fb4f33daf55569fd33e3bc63d3f3861b10b6af59f4fee" error="context can
2月 21 09:28:32 master1 dockerd[4254]: time="2025-02-21T09:28:32.883821887+08:00" level=warning msg="Error persisting manifest" digest="sha256:49b31365e2747d3a5f0fb4f33daf55569fd33e3bc63d3f3861b10b6af59f4fee" error="error writing m
2月 21 09:28:57 master1 dockerd[4254]: time="2025-02-21T09:28:57.400679460+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 09:32:02 master1 dockerd[4254]: time="2025-02-21T09:32:02.557311323+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 09:32:59 master1 dockerd[4254]: time="2025-02-21T09:32:59.654663091+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 12:18:26 master1 dockerd[4254]: time="2025-02-21T12:18:26.416513009+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 13:04:43 master1 dockerd[4254]: time="2025-02-21T13:04:43.567692258+08:00" level=warning msg="Error getting v2 registry: Get \"https://dockerproxy.com/v2/\": read tcp 10.8.43.1:52698->144.24.81.189:443: read: connection rese
2月 21 13:04:43 master1 dockerd[4254]: time="2025-02-21T13:04:43.569620961+08:00" level=warning msg="Error getting v2 registry: Get \"https://docker.mirrors.ustc.edu.cn/v2/\": dial tcp: lookup docker.mirrors.ustc.edu.cn on 10.8.3.3
2月 21 13:04:59 master1 dockerd[4254]: time="2025-02-21T13:04:59.078599869+08:00" level=warning msg="Error getting v2 registry: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while waiting for connection (Clie
2月 21 13:04:59 master1 dockerd[4254]: time="2025-02-21T13:04:59.080406758+08:00" level=error msg="Handler for POST /v1.43/images/create returned error: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while wai
2月 21 13:39:22 master1 dockerd[4254]: time="2025-02-21T13:39:22.445614179+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 14:16:05 master1 dockerd[4254]: time="2025-02-21T14:16:05.422558698+08:00" level=error msg="Upload failed: unauthorized: unauthorized to access repository: dify/dify-plugin-daemon, action: push: unauthorized to access reposi
2月 21 14:16:05 master1 dockerd[4254]: time="2025-02-21T14:16:05.432582548+08:00" level=error msg="Upload failed: unauthorized: unauthorized to access repository: dify/dify-plugin-daemon, action: push: unauthorized to access reposi
2月 21 14:16:05 master1 dockerd[4254]: time="2025-02-21T14:16:05.439152692+08:00" level=error msg="Upload failed: unauthorized: unauthorized to access repository: dify/dify-plugin-daemon, action: push: unauthorized to access reposi
3月 13 21:06:45 master1 dockerd[4254]: time="2025-03-13T21:06:45.813442730+08:00" level=warning msg="Error getting v2 registry: Get \"https://dockerproxy.com/v2/\": read tcp 10.8.43.1:38418->144.24.81.189:443: read: connection rese
3月 13 21:06:45 master1 dockerd[4254]: time="2025-03-13T21:06:45.817541784+08:00" level=warning msg="Error getting v2 registry: Get \"https://docker.mirrors.ustc.edu.cn/v2/\": dial tcp: lookup docker.mirrors.ustc.edu.cn on 10.8.3.3
3月 13 21:07:01 master1 dockerd[4254]: time="2025-03-13T21:07:01.338637220+08:00" level=warning msg="Error getting v2 registry: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while waiting for connection (Clie
3月 13 21:07:01 master1 dockerd[4254]: time="2025-03-13T21:07:01.340993565+08:00" level=error msg="Handler for POST /v1.43/images/create returned error: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while wai
4月 05 21:20:23 master1 dockerd[4254]: time="2025-04-05T21:20:23.483390843+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 01 10:56:54 master1 dockerd[4254]: time="2025-07-01T10:56:54.641432872+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 01 10:57:58 master1 dockerd[4254]: time="2025-07-01T10:57:58.526812553+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 10 15:14:46 master1 dockerd[4254]: time="2025-07-10T15:14:46.971456273+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 14 16:02:53 master1 systemd[1]: Stopping Docker Application Container Engine...
7月 14 16:02:53 master1 systemd[1]: Stopped Docker Application Container Engine.
7月 14 16:02:53 master1 systemd[1]: Starting Docker Application Container Engine...
7月 14 16:02:53 master1 dockerd[61639]: time="2025-07-14T16:02:53.817841274+08:00" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
7月 14 16:02:54 master1 systemd[1]: Started Docker Application Container Engine.
该查的都查了,实在看不出来什么原因