kubekey icon indicating copy to clipboard operation
kubekey copied to clipboard

kk 4.0离线安装报错

Open JYT59421 opened this issue 1 month ago • 12 comments

What is version of KubeKey has the issue?

v4.0.1

What is your os environment?

4.19.90-17.ky10.x86_64

KubeKey config file


A clear and concise description of what happend.

使用web install,在基本信息填写后点击下一步报错:

Image

Relevant log output

kubernetes 配置: task [ETCD | Assert disk fsync latency meets requirements](default/precheck-kubernetes-xrfc2-w5rrt) run failed: [master1]: The 90th percentile fsync latency is 612ns, which exceeds the maximum allowed: 10000000ns.: The 90th percentile fsync latency is 612ns, which exceeds the maximum allowed: 10000000ns. [master2]: The 90th percentile fsync latency is 620ns, which exceeds the maximum allowed: 10000000ns.: The 90th percentile fsync latency is 620ns, which exceeds the maximum allowed: 10000000ns. [master3]: The 90th percentile fsync latency is 620ns, which exceeds the maximum allowed: 10000000ns.: The 90th percentile fsync latency is 620ns, which exceeds the maximum allowed: 10000000ns.

Additional information

kebernetes配置: kubernetes: cluster_name: cluster1 kube_version: v1.33.4 control_plane_endpoint: host: kubesphere.ordos12345 port: 6443 type: haproxy cri: container_manager: containerd cni: type: calico max_pods: 110 service_cidr: 10.233.0.0/18 pod_cidr: 10.233.64.0/18 ipv4_mask_size: 24 ipv6_mask_size: 64 image_registry: auth: registry: harbor.ordos.local username: admin password: DR@drhy@123456 insecure: true storage_class: local: enabled: true default: true path: /data/storage/openebs/local

kubesphere core配置: ks-core: global: imageRegistry: harbor.ordos.local extension: imageRegistry: harbor.ordos.local

JYT59421 avatar Nov 18 '25 08:11 JYT59421

master节点io性能不够。master节点要安装etcd。对io性能有较高要求。

redscholar avatar Nov 18 '25 09:11 redscholar

master节点io性能不够。master节点要安装etcd。对io性能有较高要求。

可是报错信息不是:实际测量值:612ns、620ns 允许的最大值:10,000,000ns,这不是符合条件吗

JYT59421 avatar Nov 18 '25 09:11 JYT59421

你说的对,判断条件确实写反了,可以临时修改tasks文件解决:

vi $(pwd)/kubernetes/roles/precheck/etcd/tasks/main.yaml 

that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | le .cluster_require.etcd_disk_wal_fysnc_duration_seconds 修改为 that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | ge .cluster_require.etcd_disk_wal_fysnc_duration_seconds

redscholar avatar Nov 18 '25 12:11 redscholar

你说的对,判断条件确实写反了,可以临时修改tasks文件解决:

vi $(pwd)/kubernetes/roles/precheck/etcd/tasks/main.yaml 将 that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | le .cluster_require.etcd_disk_wal_fysnc_duration_seconds 修改为 that: (index (.fio_result.stdout.jobs | first) "sync" "lat_ns" "percentile" "90.000000") | ge .cluster_require.etcd_disk_wal_fysnc_duration_seconds

又有新的问题了,报错信息:task OS | Fail if operating system is not supported run failed: \n[node1]: The operating system ""kylin"" is not recognized or supported.: The operating system ""kylin"" is not recognized or supported. 我看文档不是支持kylin v10系统吗?

Image

JYT59421 avatar Nov 19 '25 01:11 JYT59421

加上这个参数。 Image

redscholar avatar Nov 19 '25 02:11 redscholar

加上这个参数。 Image

可以进入到下一步了,但安装Kubernetes时一个新的报错:

Image 貌似是因为没有在本地找到对应版本的ISO,但是我手动生成了kylin-v10-sp1-rpms-amd64.iso,放到repository下依旧未生效 官方提供的ISO版本是kylin-v10SP3-rpms-amd64.iso 系统信息: Image 脚本信息: - name: Repository | Check system version when use Kylin set_fact: sp_version: >- {{- if .os.release.VERSION | contains "Tercel" }} SP1 {{- else if .os.release.VERSION | contains "Sword" }} SP2 {{- else if .os.release.VERSION | contains "Lance" }} SP3 {{- end -}} when: .os.release.ID | unquote | eq "kylin"
- name: Repository | Define the system string based on distribution
  set_fact:
    system_string: >-
      {{- if .os.release.ID | unquote | eq "kylin" }}
      kylin-{{ .os.release.VERSION_ID }}-{{ .sp_version }}
      {{- else if .os.release.ID_LIKE | unquote | eq "rhel fedora" }}
      {{ .os.release.ID }}{{ .os.release.VERSION_ID }}
      {{- else }}
      {{ .os.release.ID }}-{{ .os.release.VERSION_ID }}
      {{- end -}}

- name: Repository | Define the package file type by system info
  set_fact:
    iso_type: >-
      {{- if .os.release.ID_LIKE | eq "debian" }}
      debs
      {{- else  }}
      rpms
      {{- end -}}

- name: Repository | Set iso file name
  when:
    - .iso_name | empty
  set_fact:
    iso_name: "{{ .system_string | replace \"\\\"\" \"\" | unquote | trim | lower }}-{{ .iso_type | trim }}-{{ .binary_type }}.iso"

node7成功了,可能因为它是一台联网机器

JYT59421 avatar Nov 19 '25 06:11 JYT59421

目前只为麒麟V10-SP3(Lance)制作了软件安装包,后续会陆续制作SP2和SP1。如果需要使用SP2和SP1的话,目前确实只能通过联网来安装必要软件。

zuoxuesong-worker avatar Nov 19 '25 07:11 zuoxuesong-worker

目前只为麒麟V10-SP3(Lance)制作了软件安装包,后续会陆续制作SP2和SP1。如果需要使用SP2和SP1的话,目前确实只能通过联网来安装必要软件。

安装之后成功了,安装kubesphere报错,没有明显日志:

Image Image

JYT59421 avatar Nov 19 '25 11:11 JYT59421

查看pod发现calico未启动 报错信息: [root@master3 ~]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-74955675b8-2vvtg 0/1 Pending 0 2m26s kube-system coredns-74955675b8-pgt22 0/1 Pending 0 2m26s kube-system coredns-7f59984797-2cpc8 0/1 Pending 0 2m28s kube-system haproxy-node1 0/1 InvalidImageName 0 2m16s kube-system haproxy-node2 0/1 InvalidImageName 0 2m16s kube-system haproxy-node3 0/1 InvalidImageName 0 2m16s kube-system haproxy-node4 0/1 InvalidImageName 0 2m15s kube-system haproxy-node7 0/1 InvalidImageName 0 2m15s kube-system kube-apiserver-master1 1/1 Running 0 2m33s kube-system kube-apiserver-master2 1/1 Running 0 117s kube-system kube-apiserver-master3 1/1 Running 0 117s kube-system kube-controller-manager-master1 1/1 Running 0 2m33s kube-system kube-controller-manager-master2 1/1 Running 0 117s kube-system kube-controller-manager-master3 1/1 Running 0 117s kube-system kube-proxy-2c9rh 1/1 Running 0 2m16s kube-system kube-proxy-7fp8v 1/1 Running 0 2m17s kube-system kube-proxy-bftng 1/1 Running 0 2m17s kube-system kube-proxy-c5vkp 1/1 Running 0 118s kube-system kube-proxy-hltgh 1/1 Running 0 118s kube-system kube-proxy-pz4tf 1/1 Running 0 2m17s kube-system kube-proxy-q6bsc 1/1 Running 0 2m17s kube-system kube-proxy-q92j6 1/1 Running 0 2m28s kube-system kube-scheduler-master1 1/1 Running 0 2m33s kube-system kube-scheduler-master2 1/1 Running 0 117s kube-system kube-scheduler-master3 1/1 Running 0 117s kube-system nodelocaldns-84c4k 1/1 Running 0 2m25s kube-system nodelocaldns-85dlx 1/1 Running 0 118s kube-system nodelocaldns-8wcrx 1/1 Running 0 2m17s kube-system nodelocaldns-b9k7g 1/1 Running 0 118s kube-system nodelocaldns-kl5vl 1/1 Running 0 2m17s kube-system nodelocaldns-mpfb4 1/1 Running 0 2m17s kube-system nodelocaldns-sfvzb 1/1 Running 0 2m17s kube-system nodelocaldns-xmb4n 1/1 Running 0 2m16s kube-system openebs-localpv-provisioner-65fb7dc667-2bkp9 0/1 Pending 0 89s kubesphere-system extensions-museum-67765f974c-6pdhs 0/1 Pending 0 67s kubesphere-system ks-apiserver-7b879dbfb4-4d9fs 0/1 Pending 0 67s kubesphere-system ks-console-58c84ffcd-k9ppj 0/1 Pending 0 67s kubesphere-system ks-controller-manager-5ff77cd5bf-xsrfz 0/1 Pending 0 67s kubesphere-system ks-posthog-5c8986bb4d-tj5jp 0/1 Pending 0 67s tigera-operator tigera-operator-6fb866c57f-ht8bt 0/1 Error 4 (50s ago) 97s [root@master3 ~]# kubectl logs -n tigera-operator -l k8s-app=tigera-operator r11 0x0 r12 0x5439078 r13 0x1a r14 0x7ffde8d15958 r15 0x0 rip 0x7fef4ed07f64 rflags 0x10207 cs 0x33 fs 0x0 gs 0x0 [root@master3 ~]# kubectl logs -n tigera-operator tigera-operator-6fb866c57f-ht8bt SIGSEGV: segmentation violation PC=0x7f5cb8e9df64 m=0 sigcode=1 addr=0x5a09000 signal arrived during cgo execution

goroutine 1 gp=0xc0000061c0 m=0 mp=0x42ea1c0 [syscall, locked to thread]: runtime.cgocall(0x401480, 0xc0000d5e00) /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc0000d5dd8 sp=0xc0000d5da0 pc=0x58312b crypto/internal/boring._Cfunc__goboringcrypto_BORINGSSL_bcm_power_on_self_test() _cgo_gotypes.go:428 +0x3f fp=0xc0000d5e00 sp=0xc0000d5dd8 pc=0x7fe75f crypto/internal/boring.init.0() /usr/local/go/src/crypto/internal/boring/boring.go:26 +0x13 fp=0xc0000d5e20 sp=0xc0000d5e00 pc=0x803f33 runtime.doInit1(0x4191c50) /usr/local/go/src/runtime/proc.go:7176 +0xe8 fp=0xc0000d5f50 sp=0xc0000d5e20 pc=0x5ca508 runtime.doInit(...) /usr/local/go/src/runtime/proc.go:7143 runtime.main() /usr/local/go/src/runtime/proc.go:253 +0x350 fp=0xc0000d5fe0 sp=0xc0000d5f50 pc=0x5bbb70 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000d5fe8 sp=0xc0000d5fe0 pc=0x5f3b41

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000c0fa8 sp=0xc0000c0f88 pc=0x5bbf0e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc0000c0fe0 sp=0xc0000c0fa8 pc=0x5bbd73 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c0fe8 sp=0xc0000c0fe0 pc=0x5f3b41 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000c1780 sp=0xc0000c1760 pc=0x5bbf0e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.bgsweep(0xc0000ea000) /usr/local/go/src/runtime/mgcsweep.go:278 +0x94 fp=0xc0000c17c8 sp=0xc0000c1780 pc=0x5a55f4 runtime.gcenable.gowrap1() /usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000c17e0 sp=0xc0000c17c8 pc=0x599f45 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c17e8 sp=0xc0000c17e0 pc=0x5f3b41 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]: runtime.gopark(0xc0000ea000?, 0x2ef8b88?, 0x1?, 0x0?, 0xc000007340?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000c1f78 sp=0xc0000c1f58 pc=0x5bbf0e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.(*scavengerState).park(0x42e4860) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc0000c1fa8 sp=0xc0000c1f78 pc=0x5a2fe9 runtime.bgscavenge(0xc0000ea000) /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc0000c1fc8 sp=0xc0000c1fa8 pc=0x5a357c runtime.gcenable.gowrap2() /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc0000c1fe0 sp=0xc0000c1fc8 pc=0x599ee5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c1fe8 sp=0xc0000c1fe0 pc=0x5f3b41 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:204 +0xa5

goroutine 5 gp=0xc000007c00 m=nil [finalizer wait]: runtime.gopark(0xc0000c0648?, 0x58d065?, 0xa8?, 0x1?, 0xc0000061c0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000c0620 sp=0xc0000c0600 pc=0x5bbf0e runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000c07e0 sp=0xc0000c0620 pc=0x598f07 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c07e8 sp=0xc0000c07e0 pc=0x5f3b41 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:164 +0x3d

rax 0x59e9078 rbx 0x7fff06cd3c68 rcx 0x2 rdx 0xd0 rdi 0x5a06000 rsi 0x5a05e60 rbp 0xc0000d5d90 rsp 0x7fff06cd3ad8 r8 0xfffffffffffffff8 r9 0x1e0 r10 0xfffffffffffffff9 r11 0x0 r12 0x59e9078 r13 0x1a r14 0x7fff06cd3c48 r15 0x0 rip 0x7f5cb8e9df64 rflags 0x10207 cs 0x33 fs 0x0 gs 0x0

JYT59421 avatar Nov 19 '25 14:11 JYT59421

看看这个issue。可能是cpu不兼容:https://github.com/projectcalico/calico/issues/10479 用低版本的calico试试:https://github.com/projectcalico/calico/issues/9962#issuecomment-2712852460

redscholar avatar Nov 20 '25 02:11 redscholar

还有一个问题。

kube-system haproxy-node1 0/1 InvalidImageName 0 2m16s 

帮忙提供一下haproxy-node1 pod 的使用的镜像名称

kubectl get pod -n kube-system haproxy-node1 -o yaml | grep image:

redscholar avatar Nov 20 '25 02:11 redscholar

还有一个问题。

kube-system haproxy-node1 0/1 InvalidImageName 0 2m16s 帮忙提供一下haproxy-node1 pod 的使用的镜像名称

kubectl get pod -n kube-system haproxy-node1 -o yaml | grep image:

目前已成功安装 附上系统信息:

Image

离线包配置文件信息:

Image 所有扩展组件全选

汇总一下之后遇到的问题 1.tigera/operator启动报错

Image

报错原因镜像版本不兼容 解决方式:将tigera/operator从v1.34.5一直降到v1.34.0可以兼容使用,v1.34.5之后版本未测试

2.calico的pod拉取的镜像版本与离线包配置不一致,默认拉取的都是v3.28.0,导致离线包内没有该版本镜像文件 解决方式:重新拉取v3.28.0版本推送镜像仓库

Image

3.haproxy启动报错 报错信息: Image: /library/haproxy:2.9.6-alpine Error: InvalidImageName 不知道离线包内为什么没有拉取这个镜像 标签也没有打上去,导致拉取失败 解决方式:重新拉取dokcer.io/kubesphere/haproxy:2.9.6-alpine,推送私有仓库

JYT59421 avatar Nov 21 '25 01:11 JYT59421