kubekey icon indicating copy to clipboard operation
kubekey copied to clipboard

Offline installation failed:get manifest list failed by module cache 离线安装失败

Open 1247776995 opened this issue 1 year ago • 9 comments

What is version of KubeKey has the issue?

v3.0.12

What is your os environment?

centos 7.9

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: k8s-master02, address: 192.168.1.245, internalAddress: 192.168.1.245, user: root, password: "xxx"}
  - {name: k8s-node02, address: 192.168.1.244, internalAddress: 192.168.1.244, user: root, password: "xxx"}
  roleGroups:
    etcd:
    - k8s-master02
    control-plane: 
    - k8s-master02
    worker:
    - k8s-master02
    - k8s-node02
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  kubernetes:
    version: v1.27.2
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: containerd
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    type: harbor
    auths:
      "https://harbor.xxxx.com:8443":
        username: admin
        password: xxxx
    privateRegistry: "https://harbor.xxxx.com:8443"
    namespaceOverride: "kubesphereio"
    registryMirrors: []
    insecureRegistries: ["https://harbor.xxxx.com:8443"]
  addons: []



---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.4.0
spec:
  persistence:
    storageClass: ""
  authentication:
    jwtSecret: ""
  zone: ""
  local_registry: ""
  namespace_override: ""
  # dev_tag: ""
  etcd:
    monitoring: false
    endpointIps: localhost
    port: 2379
    tlsEnable: true
  common:
    core:
      console:
        enableMultiLogin: true
        port: 30880
        type: NodePort
    # apiserver:
    #  resources: {}
    # controllerManager:
    #  resources: {}
    redis:
      enabled: false
      enableHA: false
      volumeSize: 2Gi
    openldap:
      enabled: false
      volumeSize: 2Gi
    minio:
      volumeSize: 20Gi
    monitoring:
      # type: external
      endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
      GPUMonitoring:
        enabled: false
    gpu:
      kinds:
      - resourceName: "nvidia.com/gpu"
        resourceType: "GPU"
        default: true
    es:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      logMaxAge: 7
      elkPrefix: logstash
      basicAuth:
        enabled: false
        username: ""
        password: ""
      externalElasticsearchHost: ""
      externalElasticsearchPort: ""
    opensearch:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      enabled: true
      logMaxAge: 7
      opensearchPrefix: whizard
      basicAuth:
        enabled: true
        username: "admin"
        password: "admin"
      externalOpensearchHost: ""
      externalOpensearchPort: ""
      dashboard:
        enabled: false
  alerting:
    enabled: false
    # thanosruler:
    #   replicas: 1
    #   resources: {}
  auditing:
    enabled: false
    # operator:
    #   resources: {}
    # webhook:
    #   resources: {}
  devops:
    enabled: false
    jenkinsCpuReq: 0.5
    jenkinsCpuLim: 1
    jenkinsMemoryReq: 4Gi
    jenkinsMemoryLim: 4Gi
    jenkinsVolumeSize: 16Gi
  events:
    enabled: false
    # operator:
    #   resources: {}
    # exporter:
    #   resources: {}
    # ruler:
    #   enabled: true
    #   replicas: 2
    #   resources: {}
  logging:
    enabled: false
    logsidecar:
      enabled: true
      replicas: 2
      # resources: {}
  metrics_server:
    enabled: false
  monitoring:
    storageClass: ""
    node_exporter:
      port: 9100
      # resources: {}
    # kube_rbac_proxy:
    #   resources: {}
    # kube_state_metrics:
    #   resources: {}
    # prometheus:
    #   replicas: 1
    #   volumeSize: 20Gi
    #   resources: {}
    #   operator:
    #     resources: {}
    # alertmanager:
    #   replicas: 1
    #   resources: {}
    # notification_manager:
    #   resources: {}
    #   operator:
    #     resources: {}
    #   proxy:
    #     resources: {}
    gpu:
      nvidia_dcgm_exporter:
        enabled: false
        # resources: {}
  multicluster:
    clusterRole: none
  network:
    networkpolicy:
      enabled: false
    ippool:
      type: none
    topology:
      type: none
  openpitrix:
    store:
      enabled: false
  servicemesh:
    enabled: false
    istio:
      components:
        ingressGateways:
        - name: istio-ingressgateway
          enabled: false
        cni:
          enabled: false
  edgeruntime:
    enabled: false
    kubeedge:
      enabled: false
      cloudCore:
        cloudHub:
          advertiseAddress:
            - ""
        service:
          cloudhubNodePort: "30000"
          cloudhubQuicNodePort: "30001"
          cloudhubHttpsNodePort: "30002"
          cloudstreamNodePort: "30003"
          tunnelNodePort: "30004"
        # resources: {}
        # hostNetWork: false
      iptables-manager:
        enabled: true
        mode: "external"
        # resources: {}
      # edgeService:
      #   resources: {}
  gatekeeper:
    enabled: false
    # controller_manager:
    #   resources: {}
    # audit:
    #   resources: {}
  terminal:
    timeout: 600

A clear and concise description of what happend.

离线安装在执行:./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz --with-packages 命令的时候失败,我是本地安装的harbor(https证书已分发,docker可以登录成功,可正常pull,push,k8s版本v1.27.2)

Relevant log output

[root@k8s-node02 kubekey]# ./kk artifact image push -f config-sample.yaml -a kubesphere-v3.4.0-artifact.tar.gz 


 _   __      _          _   __           
| | / /     | |        | | / /           
| |/ / _   _| |__   ___| |/ /  ___ _   _ 
|    \| | | | '_ \ / _ \    \ / _ \ | | |
| |\  \ |_| | |_) |  __/ |\  \  __/ |_| |
\_| \_/\__,_|_.__/ \___\_| \_/\___|\__, |
                                    __/ |
                                   |___/

11:14:27 CST [UnArchiveArtifactModule] Check the KubeKey artifact md5 value
11:15:38 CST success: [LocalHost]
11:15:38 CST [UnArchiveArtifactModule] UnArchive the KubeKey artifact
11:15:38 CST skipped: [LocalHost]
11:15:38 CST [UnArchiveArtifactModule] Create the KubeKey artifact Md5 file
11:15:38 CST skipped: [LocalHost]
11:15:38 CST [CopyImagesToRegistryModule] Copy images to a private registry from an artifact OCI Path
11:15:38 CST Source: oci:/data/kubekey/kubekey/images:kubesphereio:kubectl:v1.22.0-amd64
11:15:38 CST Destination: docker://https://harbor.gch.com:8443/kubesphereio/kubectl:v1.22.0-amd64
11:15:38 CST success: [LocalHost]
11:15:38 CST [CopyImagesToRegistryModule] Push multi-arch manifest to private registry
11:15:38 CST message: [LocalHost]
get manifest list failed by module cache
11:15:38 CST failed: [LocalHost]
error: Pipeline[ArtifactImagesPushPipeline] execute failed: Module[CopyImagesToRegistryModule] exec failed: 
failed: [LocalHost] [PushManifest] exec failed after 1 retries: get manifest list failed by module cache

Additional information

No response

1247776995 avatar Nov 27 '23 06:11 1247776995

这个问题是因为官方的create_project_harbor.sh中,默认没有kubesphereio这个项目,而使用kubekey创建集群时,首先会CopyImagesToRegistry,如果没有kubeSphereio这个目录,会导致push不成功,会不执行CopyImagesToRegistry的Execute方法中的c.ModuleCache.Set("manifestList", manifestList)这段代码,最终在PushManifest的Execute时,会找不到manifestList。解决办法就是在脚本中增加kubesphereio或者在harbor上手动创建kubesphereio或者config-sample.yaml中的namespaceOverride修改为”kubesphere”(不过这个我没试过)

kubekey在调用CopyImageOptions的Copy()时,虽然有重试机制,但最终并未把push不成功的error打印出来,导致后面的错误看起来非常不理解,希望能帮助到您!

frakes-zou avatar Nov 28 '23 10:11 frakes-zou

这个问题是因为官方的create_project_harbor.sh中,默认没有kubesphereio这个项目,而使用kubekey创建集群时,首先会CopyImagesToRegistry,如果没有kubeSphereio这个目录,会导致push不成功,会不执行CopyImagesToRegistry的Execute方法中的c.ModuleCache.Set("manifestList", manifestList)这段代码,最终在PushManifest的Execute时,会找不到manifestList。解决办法就是在脚本中增加kubesphereio或者在harbor上手动创建kubesphereio或者config-sample.yaml中的namespaceOverride修改为”kubesphere”(不过这个我没试过)

kubekey在调用CopyImageOptions的Copy()时,虽然有重试机制,但最终并未把push不成功的error打印出来,导致后面的错误看起来非常不理解,希望能帮助到您! 感谢您的回复,但我想可能还有别的原因,因为我检查了我的Harbor仓库项目名称,与您说的是一样的,我的问题是在于将打包好的制品拷贝到另一台同样版本系统的离线机器上再执行kk安装导致的,在我的在线环境中没有这个问题,它产生在另一个离线环境中,我很困惑,但是找不到原因。再次感谢您的回复。

1247776995 avatar Dec 12 '23 06:12 1247776995

这个问题是因为官方的create_project_harbor.sh中,默认没有kubesphereio这个项目,而使用kubekey创建集群时,首先会CopyImagesToRegistry,如果没有kubeSphereio这个目录,会导致push不成功,会不执行CopyImagesToRegistry的Execute方法中的c.ModuleCache.Set("manifestList", manifestList)这段代码,最终在PushManifest的Execute时,会找不到manifestList。解决办法就是在脚本中增加kubesphereio或者在harbor上手动创建kubesphereio或者config-sample.yaml中的namespaceOverride修改为”kubesphere”(不过这个我没试过) kubekey在调用CopyImageOptions的Copy()时,虽然有重试机制,但最终并未把push不成功的error打印出来,导致后面的错误看起来非常不理解,希望能帮助到您! 感谢您的回复,但我想可能还有别的原因,因为我检查了我的Harbor仓库项目名称,与您说的是一样的,我的问题是在于将打包好的制品拷贝到另一台同样版本系统的离线机器上再执行kk安装导致的,在我的在线环境中没有这个问题,它产生在另一个离线环境中,我很困惑,但是找不到原因。再次感谢您的回复。


这个问题也很好解决,如果可以的话,你基于kubekey在相应代码处打印上日志自己构建一个可执行文件,然后在你发生问题的环境中调试以帮助你排查问题,这个错误大体的原因是:通过离线解压的image目录下的镜像制品在你的harbor上没有找到导致的。

frakes-zou avatar Dec 15 '23 12:12 frakes-zou

The solution is here 👉 https://github.com/kubesphere/kubekey/issues/2025#issuecomment-1859417230

1247776995 avatar Dec 19 '23 02:12 1247776995

感谢您的回复,但我想可能还有别的原因,因为我检查了我的Harbor仓库项目名称,与您说的是一样的,我的问题是在于将打包好的制品拷贝到另一台同样版本系统的离线机器上再执行kk安装导致的,在我的在线环境中没有这个问题,它产生在另一个离线环境中,我很困惑,但是找不到原因。再次感谢您的回复。

求助,最终怎么解决的,我也有同样的问题,kubesphereio项目已经创建了,还是报同样的错误

dainingwaps avatar Dec 29 '23 07:12 dainingwaps

大佬们最后有解决吗,我也遇到这个问题了,harbor上创建了kubesphereio也还是不行。

super-zhang avatar Apr 01 '24 09:04 super-zhang

The new version of kk can try to used to push images. If it fails, it will return the reason for failure. https://github.com/kubesphere/kubekey/releases/tag/v3.1.0-rc.1

pixiake avatar Apr 01 '24 13:04 pixiake

大佬们最后有解决吗,我也遇到这个问题了,harbor上创建了kubesphereio也还是不行。

我已经解决了,你可以尝试换个新版本的kk再试一次,特别注意docker的证书路径是否在sample.conf中配置了

1247776995 avatar Apr 02 '24 07:04 1247776995

大佬们最后有解决吗,我也遇到这个问题了,harbor上创建了kubesphereio也还是不行。

我已经解决了,你可以尝试换个新版本的kk再试一次,特别注意docker的证书路径是否在sample.conf中配置了

收到大佬 ,我来试试最新的Rc版本

super-zhang avatar Apr 02 '24 07:04 super-zhang