kubekey icon indicating copy to clipboard operation
kubekey copied to clipboard

添加节点时卡死

Open fitme96 opened this issue 6 months ago • 3 comments

What is version of KubeKey has the issue?

kk version: &version.Info{Major:"3", Minor:"1", GitVersion:"v3.1.9", GitCommit:"f7f74890ec51db1e4c35b54af8ecc87d7f807deb", GitTreeState:"clean", BuildDate:"2025-04-25T03:16:36Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

What is your os environment?

Ubuntu 22.04.5 LTS

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: test-k8smaster-1-199, address: xx, internalAddress: xx, user: root, password: ""}
  - {name: test-k8snode-1-200, address: xx, internalAddress: xx, user: root, password: ""}
  - {name: test-k8snode-2-203, address: xx, internalAddress: xx, user: root, password: ""}
  - {name: test-k8snode-5-205, address: xx, internalAddress: xx, user: root, password: ""}
  - {name: test-k8snode-1-206, address: xx, internalAddress: xx, user: root, password: ""}
  roleGroups:
    etcd:
    - test-k8smaster-1-199
    control-plane: 
    - test-k8smaster-1-199
    worker:
    - test-k8snode-1-200
    - test-k8snode-2-203
    - test-k8snode-5-205
    - test-k8snode-1-206
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  kubernetes:
    version: v1.30.8
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: containerd
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.133.64.0/18
    kubeServiceCIDR: 10.133.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: "harborinternal.xxxcom:8012"
    namespaceOverride: "kubesphereio"
    registryMirrors: []
    insecureRegistries: []
  addons: []

A clear and concise description of what happend.

你好, 我在之前两次增加节点时非常顺利,今天新购一台机器并按照之前的方式增加节点时,在ConfigureOSModule节点卡死. 我通过kubesphere社区https://ask.kubesphere.com.cn/forum/d/5172-arm64/12 看到可能是echo 3 > /proc/sys/vm/drop_caches 导致, 我克隆代码到本地准备注释这一行,但我发现在1个月前已经被注释,所以我尝试在initOS.sh增加set -x ,结果如下

  • sync
  • update-alternatives --set iptables /usr/sbin/iptables-legacy
  • update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
  • update-alternatives --set arptables /usr/sbin/arptables-legacy
  • true
  • update-alternatives --set ebtables /usr/sbin/ebtables-legacy ^C 脚本看起来已经执行完成, 但是kk并没有继续执行后续步骤

脚本中增加echo 'test'

我尝试在update-alternatives --set ebtables /usr/sbin/ebtables-legacy 后面增加 echo 'test' ,脚本成功执行到echo 'test'.

注释ExecScript

没办法我注释了 ExecScript步骤,kk成功添加了这个节点.

// ExecScript := &task.RemoteTask{
// 	Name:     "ExecScript",
// 	Desc:     "Exec init os script",
// 	Hosts:    c.Runtime.GetAllHosts(),
// 	Action:   new(NodeExecScript),
// 	Parallel: true,
// }

ConfigureNtpServer := &task.RemoteTask{
	Name:     "ConfigureNtpServer",
	Desc:     "configure the ntp server for each node",
	Hosts:    c.Runtime.GetAllHosts(),
	Prepare:  new(NodeConfigureNtpCheck),
	Action:   new(NodeConfigureNtpServer),
	Parallel: true,
}

c.Tasks = []task.Interface{
	getOSData,
	initOS,
	GenerateScript,
	// ExecScript,
	ConfigureNtpServer,
}

}

Relevant log output


Additional information

No response

fitme96 avatar Jun 12 '25 10:06 fitme96

@fitme96 能帮忙测试一下,卡在了哪一步吗。可以把这个脚本拿出来单独在机器上执行。

redscholar avatar Jun 12 '25 10:06 redscholar

@fitme96 能帮忙测试一下,卡在了哪一步吗。可以把这个脚本拿出来单独在机器上执行。

我分别在新增节点和master执行了bash -x /usr/local/bin/kube-scripts/initOS.sh ,退出状态码是0.

fitme96 avatar Jun 12 '25 10:06 fitme96

我补充了我的KubeKey config

fitme96 avatar Jun 12 '25 10:06 fitme96