kubekey icon indicating copy to clipboard operation
kubekey copied to clipboard

使用命令:./kk add nodes -f ksp-v3013-v12310-offline.yaml报错lb.kubesphere.local:6443 was refused - did you specify the right host or port?

Open yqwang930907 opened this issue 5 months ago • 4 comments

What is version of KubeKey has the issue?

kk version: &version.Info{Major:"3", Minor:"0", GitVersion:"v3.0.13", GitCommit:"ac75d3ef3c22e6a9d999dcea201234d6651b3e72", GitTreeState:"clean", BuildDate:"2023-10-30T11:15:14Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

What is your os environment?

centos7.9

问题描述

以下操作都是是用root账号在master1上执行的

使用命令:./kk add nodes -f ksp-v3013-v12310-offline.yaml,目的是添加一个worker9节点。 但是运行过程中出现错误

15:40:34 CST [KubernetesStatusModule] Get kubernetes cluster status

15:40:34 CST stdout: [master1]

v1.23.10

15:40:34 CST stdout: [master1]

The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?

15:40:34 CST message: [master1]

get kubernetes cluster info failed: Failed to exec command: sudo -E /bin/bash -c “/usr/local/bin/kubectl –no-headers=true get nodes -o custom-columns=:metadata.name,:status.nodeInfo.kubeletVersion,:status.addresses”

The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?: Process exited with status 1

15:40:34 CST retry: [master1]

已做的检查如下:

1.检查了dns也没问题啊,手动执行“/usr/local/bin/kubectl –no-headers=true get nodes -o custom-columns=:metadata.name,:status.nodeInfo.kubeletVersion,:status.addresses”也能出结果。

2.curl -k https://127.0.0.1:6443/healthz

输出也是ok

3.节点、pods检查....都正常

[root@master1 offline]# kubectl get nodes
NAME      STATUS   ROLES                  AGE    VERSION
master1   Ready    control-plane,master   390d   v1.23.10
master2   Ready    control-plane,master   390d   v1.23.10
master3   Ready    control-plane,master   390d   v1.23.10
worker1   Ready    worker                 390d   v1.23.10
worker2   Ready    worker                 390d   v1.23.10
worker3   Ready    worker                 390d   v1.23.10
worker4   Ready    worker                 377d   v1.23.10
worker5   Ready    worker                 377d   v1.23.10
worker6   Ready    <none>                 377d   v1.23.10
worker7   Ready    worker                 323d   v1.23.10
worker8   Ready    worker                 226d   v1.23.10
[root@master1 offline]# kubectl get pods -n kube-system
NAME                                           READY   STATUS    RESTARTS       AGE
calico-kube-controllers-5568fcfc56-z7hs4       1/1     Running   1 (99m ago)    207d
calico-node-22lxz                              1/1     Running   1              252d
calico-node-2xs7c                              1/1     Running   2 (99m ago)    395d
calico-node-4qqjz                              1/1     Running   1              395d
calico-node-6r25b                              1/1     Running   0              230d
calico-node-krdfb                              1/1     Running   1              252d
calico-node-lckhs                              1/1     Running   1              252d
calico-node-nth8x                              1/1     Running   1              252d
calico-node-p58tm                              1/1     Running   1              395d
calico-node-qpkqd                              1/1     Running   1              252d
calico-node-v2777                              1/1     Running   1              252d
calico-node-zbbs6                              1/1     Running   2 (5d6h ago)   252d
coredns-57b586455-8p9vj                        1/1     Running   2 (99m ago)    395d
coredns-57b586455-kzgt5                        1/1     Running   2 (99m ago)    395d
haproxy-worker1                                1/1     Running   1              395d
haproxy-worker2                                1/1     Running   5 (5d6h ago)   395d
haproxy-worker3                                1/1     Running   1              395d
haproxy-worker4                                1/1     Running   2              382d
haproxy-worker5                                1/1     Running   2              382d
haproxy-worker6                                1/1     Running   2              382d
haproxy-worker7                                1/1     Running   1              328d
haproxy-worker8                                1/1     Running   0              230d
kube-apiserver-master1                         1/1     Running   23 (43m ago)   126m
kube-apiserver-master2                         1/1     Running   2 (57d ago)    395d
kube-apiserver-master3                         1/1     Running   2 (57d ago)    395d
kube-controller-manager-master1                1/1     Running   7 (99m ago)    395d
kube-controller-manager-master2                1/1     Running   3 (57d ago)    395d
kube-controller-manager-master3                1/1     Running   3 (57d ago)    395d
kube-proxy-42hdh                               1/1     Running   1              252d
kube-proxy-45b7c                               1/1     Running   1              252d
kube-proxy-d8jqc                               1/1     Running   1              252d
kube-proxy-fqkkr                               1/1     Running   1              395d
kube-proxy-kwg5d                               1/1     Running   2 (5d6h ago)   252d
kube-proxy-nd2fg                               1/1     Running   0              230d
kube-proxy-rj2vh                               1/1     Running   1              252d
kube-proxy-rtcxm                               1/1     Running   2 (99m ago)    395d
kube-proxy-vr9t5                               1/1     Running   1              252d
kube-proxy-xbc2p                               1/1     Running   1              395d
kube-proxy-zbck6                               1/1     Running   1              252d
kube-scheduler-master1                         1/1     Running   6 (99m ago)    395d
kube-scheduler-master2                         1/1     Running   3 (57d ago)    395d
kube-scheduler-master3                         1/1     Running   3 (57d ago)    395d
metrics-server-dcc48455d-fdkhw                 1/1     Running   4 (80d ago)    395d
nodelocaldns-2gkjp                             1/1     Running   1              382d
nodelocaldns-57dpv                             1/1     Running   2 (99m ago)    395d
nodelocaldns-684bh                             1/1     Running   1              395d
nodelocaldns-9blhv                             1/1     Running   1              328d
nodelocaldns-b6vg2                             1/1     Running   0              230d
nodelocaldns-br5jd                             1/1     Running   1              395d
nodelocaldns-g2fsk                             1/1     Running   2              382d
nodelocaldns-hjrnj                             1/1     Running   2 (5d6h ago)   395d
nodelocaldns-mr7gx                             1/1     Running   2              382d
nodelocaldns-qs6p9                             1/1     Running   1              395d
nodelocaldns-tftpl                             1/1     Running   1              395d
openebs-localpv-provisioner-754596f596-84cwq   1/1     Running   4              395d
snapshot-controller-0                          1/1     Running   0              207d

4.kubelet状态存在错误

[root@master1 offline]# systemctl status -l kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 四 2025-07-10 16:55:43 CST; 3 days ago
     Docs: http://kubernetes.io/docs/
 Main PID: 56968 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─56968 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=10.8.3.236/kubesphereio/pause:3.6 --node-ip=10.8.43.1 --hostname-override=master1

7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.951994   56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?resourceVersion=0&timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.953604   56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.955137   56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.956563   56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.957842   56968 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting node \"master1\": Get \"https://127.0.0.1:6443/api/v1/nodes/master1?timeout=10s\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:10 master1 kubelet[56968]: E0714 08:44:10.957857   56968 kubelet_node_status.go:447] "Unable to update node status" err="update node status exceeds retry count"
7月 14 08:44:11 master1 kubelet[56968]: E0714 08:44:11.360885   56968 event.go:276] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-master1.1850d8a7aff27a33", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"171589920", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-apiserver-master1", UID:"f5b4d2cb38bdb33e484f494815280414", APIVersion:"v1", ResourceVersion:"", FieldPath:""}, Reason:"SandboxChanged", Message:"Pod sandbox changed, it will be killed and re-created.", Source:v1.EventSource{Component:"kubelet", Host:"master1"}, FirstTimestamp:time.Date(2025, time.July, 10, 17, 7, 49, 0, time.Local), LastTimestamp:time.Date(2025, time.July, 14, 8, 43, 47, 767110131, time.Local), Count:8, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://127.0.0.1:6443/api/v1/namespaces/kube-system/events/kube-apiserver-master1.1850d8a7aff27a33": dial tcp 127.0.0.1:6443: connect: connection refused'(may retry after sleeping)
7月 14 08:44:13 master1 kubelet[56968]: I0714 08:44:13.445530   56968 status_manager.go:664] "Failed to get status for pod" podUID=f5b4d2cb38bdb33e484f494815280414 pod="kube-system/kube-apiserver-master1" err="Get \"https://127.0.0.1:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-master1\": dial tcp 127.0.0.1:6443: connect: connection refused"
7月 14 08:44:15 master1 kubelet[56968]: I0714 08:44:15.444705   56968 scope.go:110] "RemoveContainer" containerID="1aba3f6984d2f56e6eb9748a50913c0e434d8d8e91b62b3d862a2b4c39cedf3f"
7月 14 08:44:17 master1 kubelet[56968]: E0714 08:44:17.916442   56968 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: unknown (get configmaps)


之前添加过3次工作节点,都是正常的,很快都能添加成功。但是这次添加节点失败的问题,实在找不到原因,求解决~

yqwang930907 avatar Jul 10 '25 09:07 yqwang930907

第一步中的检查,是在master1上吗。和执行kubekey命令的都是同一个用户吗。

redscholar avatar Jul 11 '25 02:07 redscholar

是用root账号在master1上执行的

yqwang930907 avatar Jul 11 '25 02:07 yqwang930907

kubelet 报错,可能是你的docker服务也有问题。使用journalctl看看kubelet服务和docker服务的日志吧。

redscholar avatar Jul 14 '25 07:07 redscholar

kubelet 报错,可能是你的docker服务也有问题。使用journalctl看看kubelet服务和docker服务的日志吧。

就很奇怪,我半年没动过,感觉他自己就出问题了。

kubelete日志:

[root@master1 offline]# systemctl status kubelet.service -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 四 2025-07-10 16:55:43 CST; 4 days ago
     Docs: http://kubernetes.io/docs/
 Main PID: 56968 (kubelet)
   CGroup: /system.slice/kubelet.service
           └─56968 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=10.8.3.236/kubesphereio/pause:3.6 --node-ip=10.8.43.1 --hostname-override=master1

7月 15 10:13:21 master1 kubelet[56968]: I0715 10:13:21.565388   56968 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume \"k8s-certs\" (UniqueName: \"kubernetes.io/host-path/f5b4d2cb38bdb33e484f494815280414-k8s-certs\") pod \"kube-apiserver-master1\" (UID: \"f5b4d2cb38bdb33e484f494815280414\") " pod="kube-system/kube-apiserver-master1"
7月 15 10:13:21 master1 kubelet[56968]: I0715 10:13:21.565405   56968 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume \"ca-certs\" (UniqueName: \"kubernetes.io/host-path/f5b4d2cb38bdb33e484f494815280414-ca-certs\") pod \"kube-apiserver-master1\" (UID: \"f5b4d2cb38bdb33e484f494815280414\") " pod="kube-system/kube-apiserver-master1"
7月 15 10:13:21 master1 kubelet[56968]: I0715 10:13:21.565421   56968 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume \"etcd-certs-0\" (UniqueName: \"kubernetes.io/host-path/f5b4d2cb38bdb33e484f494815280414-etcd-certs-0\") pod \"kube-apiserver-master1\" (UID: \"f5b4d2cb38bdb33e484f494815280414\") " pod="kube-system/kube-apiserver-master1"
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.734523   56968 reflector.go:138] object-"kubesphere-logging-system"/"fluent-bit-config": Failed to watch *v1.Secret: unknown (get secrets)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757814   56968 reflector.go:138] object-"kube-system"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757836   56968 reflector.go:138] object-"kube-system"/"nodelocaldns": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757909   56968 reflector.go:138] object-"kubesphere-logging-system"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757922   56968 reflector.go:138] object-"kubesphere-monitoring-system"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757938   56968 reflector.go:138] object-"kube-system"/"kube-proxy": Failed to watch *v1.ConfigMap: unknown (get configmaps)
7月 15 10:13:23 master1 kubelet[56968]: E0715 10:13:23.757946   56968 reflector.go:138] object-"weave"/"kube-root-ca.crt": Failed to watch *v1.ConfigMap: unknown (get configmaps)

docker日志:

[root@master1 ~]# journalctl -u docker
-- Logs begin at 四 2024-12-26 10:20:38 CST, end at 一 2025-07-14 16:20:53 CST. --
2月 21 09:24:21 master1 dockerd[4254]: time="2025-02-21T09:24:21.235192067+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 09:28:32 master1 dockerd[4254]: time="2025-02-21T09:28:32.883766286+08:00" level=warning msg="error aborting content ingest" digest="sha256:49b31365e2747d3a5f0fb4f33daf55569fd33e3bc63d3f3861b10b6af59f4fee" error="context can
2月 21 09:28:32 master1 dockerd[4254]: time="2025-02-21T09:28:32.883821887+08:00" level=warning msg="Error persisting manifest" digest="sha256:49b31365e2747d3a5f0fb4f33daf55569fd33e3bc63d3f3861b10b6af59f4fee" error="error writing m
2月 21 09:28:57 master1 dockerd[4254]: time="2025-02-21T09:28:57.400679460+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 09:32:02 master1 dockerd[4254]: time="2025-02-21T09:32:02.557311323+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 09:32:59 master1 dockerd[4254]: time="2025-02-21T09:32:59.654663091+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 12:18:26 master1 dockerd[4254]: time="2025-02-21T12:18:26.416513009+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 13:04:43 master1 dockerd[4254]: time="2025-02-21T13:04:43.567692258+08:00" level=warning msg="Error getting v2 registry: Get \"https://dockerproxy.com/v2/\": read tcp 10.8.43.1:52698->144.24.81.189:443: read: connection rese
2月 21 13:04:43 master1 dockerd[4254]: time="2025-02-21T13:04:43.569620961+08:00" level=warning msg="Error getting v2 registry: Get \"https://docker.mirrors.ustc.edu.cn/v2/\": dial tcp: lookup docker.mirrors.ustc.edu.cn on 10.8.3.3
2月 21 13:04:59 master1 dockerd[4254]: time="2025-02-21T13:04:59.078599869+08:00" level=warning msg="Error getting v2 registry: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while waiting for connection (Clie
2月 21 13:04:59 master1 dockerd[4254]: time="2025-02-21T13:04:59.080406758+08:00" level=error msg="Handler for POST /v1.43/images/create returned error: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while wai
2月 21 13:39:22 master1 dockerd[4254]: time="2025-02-21T13:39:22.445614179+08:00" level=error msg="Not continuing with pull after error: context canceled"
2月 21 14:16:05 master1 dockerd[4254]: time="2025-02-21T14:16:05.422558698+08:00" level=error msg="Upload failed: unauthorized: unauthorized to access repository: dify/dify-plugin-daemon, action: push: unauthorized to access reposi
2月 21 14:16:05 master1 dockerd[4254]: time="2025-02-21T14:16:05.432582548+08:00" level=error msg="Upload failed: unauthorized: unauthorized to access repository: dify/dify-plugin-daemon, action: push: unauthorized to access reposi
2月 21 14:16:05 master1 dockerd[4254]: time="2025-02-21T14:16:05.439152692+08:00" level=error msg="Upload failed: unauthorized: unauthorized to access repository: dify/dify-plugin-daemon, action: push: unauthorized to access reposi
3月 13 21:06:45 master1 dockerd[4254]: time="2025-03-13T21:06:45.813442730+08:00" level=warning msg="Error getting v2 registry: Get \"https://dockerproxy.com/v2/\": read tcp 10.8.43.1:38418->144.24.81.189:443: read: connection rese
3月 13 21:06:45 master1 dockerd[4254]: time="2025-03-13T21:06:45.817541784+08:00" level=warning msg="Error getting v2 registry: Get \"https://docker.mirrors.ustc.edu.cn/v2/\": dial tcp: lookup docker.mirrors.ustc.edu.cn on 10.8.3.3
3月 13 21:07:01 master1 dockerd[4254]: time="2025-03-13T21:07:01.338637220+08:00" level=warning msg="Error getting v2 registry: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while waiting for connection (Clie
3月 13 21:07:01 master1 dockerd[4254]: time="2025-03-13T21:07:01.340993565+08:00" level=error msg="Handler for POST /v1.43/images/create returned error: Get \"https://registry-1.docker.io/v2/\": net/http: request canceled while wai
4月 05 21:20:23 master1 dockerd[4254]: time="2025-04-05T21:20:23.483390843+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 01 10:56:54 master1 dockerd[4254]: time="2025-07-01T10:56:54.641432872+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 01 10:57:58 master1 dockerd[4254]: time="2025-07-01T10:57:58.526812553+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 10 15:14:46 master1 dockerd[4254]: time="2025-07-10T15:14:46.971456273+08:00" level=warning msg="Error getting v2 registry: Get \"https://harbor.tw-solar.com/v2/\": dial tcp 10.8.3.236:443: connect: connection refused"
7月 14 16:02:53 master1 systemd[1]: Stopping Docker Application Container Engine...
7月 14 16:02:53 master1 systemd[1]: Stopped Docker Application Container Engine.
7月 14 16:02:53 master1 systemd[1]: Starting Docker Application Container Engine...
7月 14 16:02:53 master1 dockerd[61639]: time="2025-07-14T16:02:53.817841274+08:00" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
7月 14 16:02:54 master1 systemd[1]: Started Docker Application Container Engine.

该查的都查了,实在看不出来什么原因

yqwang930907 avatar Jul 14 '25 08:07 yqwang930907