zarf icon indicating copy to clipboard operation
zarf copied to clipboard

Zarf-Seed-Registry Installation Fails on Init with Deployment is not ready: zarf/zarf-docker-registry error

Open erikschlegel opened this issue 2 years ago • 33 comments

Environment

Device and OS: Azure AKS Linux Ubuntu 20.04 App version: 0.19.6 Kubernetes distro being used: AKS Kubernetes V 1.22.6 Other:

Steps to reproduce

  1. Create an AKS Cluster
  2. Run zarf init --components git-server.

Expected result

Command succeeds and Zarf is initialized in the cluster.

Actual Result

The following message repeats until the init run timesout
Deployment is not ready: zarf/zarf-docker-registry. 0 out of 1 expected pods are ready

output of kubectl -n zarf get events

LAST SEEN   TYPE      REASON              OBJECT                                       MESSAGE
15m         Normal    Scheduled           pod/injector                                 Successfully assigned zarf/injector to aks-agentpool-40722291-vmss000001
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-018" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-023" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-013" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-008" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-009" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-019" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "stage1" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-027" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 MountVolume.SetUp failed for volume "zarf-payload-007" : failed to sync configmap cache: timed out waiting for the condition
15m         Warning   FailedMount         pod/injector                                 (combined from similar events): MountVolume.SetUp failed for volume "zarf-payload-027" : failed to sync configmap cache: timed out waiting for the condition
15m         Normal    Scheduled           pod/zarf-docker-registry-789d8ddfb8-4pfgj    Successfully assigned zarf/zarf-docker-registry-789d8ddfb8-4pfgj to aks-agentpool-40722291-vmss000000
14m         Normal    Pulling             pod/zarf-docker-registry-789d8ddfb8-4pfgj    Pulling image "127.0.0.1:32178/library/registry:2.7.1"
14m         Warning   Failed              pod/zarf-docker-registry-789d8ddfb8-4pfgj    Failed to pull image "127.0.0.1:32178/library/registry:2.7.1": rpc error: code = Unknown desc = failed to pull and unpack image "127.0.0.1:32178/library/registry:2.7.1": failed to resolve reference "127.0.0.1:32178/library/registry:2.7.1": failed to do request: Head "https://127.0.0.1:32178/v2/library/registry/manifests/2.7.1": http: server gave HTTP response to HTTPS client
14m         Warning   Failed              pod/zarf-docker-registry-789d8ddfb8-4pfgj    Error: ErrImagePull
34s         Normal    BackOff             pod/zarf-docker-registry-789d8ddfb8-4pfgj    Back-off pulling image "127.0.0.1:32178/library/registry:2.7.1"
13m         Warning   Failed              pod/zarf-docker-registry-789d8ddfb8-4pfgj    Error: ImagePullBackOff
15m         Normal    SuccessfulCreate    replicaset/zarf-docker-registry-789d8ddfb8   Created pod: zarf-docker-registry-789d8ddfb8-4pfgj
15m         Normal    ScalingReplicaSet   deployment/zarf-docker-registry              Scaled up replica set zarf-docker-registry-789d8ddfb8 to 1

output of kubectl -n zarf get all

NAME                                        READY   STATUS             RESTARTS   AGE
pod/injector                                1/1     Running            0          40m
pod/zarf-docker-registry-789d8ddfb8-4pfgj   0/1     ImagePullBackOff   0          40m

NAME                           TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
service/zarf-docker-registry   NodePort   10.0.16.59     <none>        5000:31999/TCP   40m
service/zarf-injector          NodePort   10.0.144.122   <none>        5000:32178/TCP   40m

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/zarf-docker-registry   0/1     1            0           40m

NAME                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/zarf-docker-registry-789d8ddfb8   1         1         0       40m

Severity/Priority

😕 Blocked on deploying zarf packages to Azure AKS

erikschlegel avatar Jul 07 '22 01:07 erikschlegel

We haven't run into this issue on AKS before, but it looks like the node container runtime is trying to call localhost via https vs http, which is the standard for containerd and crio. Is there any special config or other details about this provisioning that might change the container runtime by chance?

Would be helpful to run 'zarf destroy --confirm --remove-components' and then 'zarf init -l=trace'.

Sorry if markdown is weird, using the GitHub app right now.

jeff-mccoy avatar Jul 07 '22 02:07 jeff-mccoy

Thanks for the response @jeff-mccoy. The strange thing is I provisioned a standard AKS cluster(v1.22.6) using the default settings from Azure, so nothing custom.

Here's the output from zarf init -l=trace

  DEBUG   Processing k8s manifest files /var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml                                                                                                                
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:64)

  DEBUG   template.Apply({zarf-seed-registry  false true [] [{docker-registry  https://github.com/defenseunicorns/docker-registry.helm.git 2.1.0-zarf zarf [packages/zarf-registry/registry-values.yaml packages/zarf-registry/registry-values-seed.yaml] }] [] [] [] [] {false 0 false [] []} { }  map[] }, /var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml)
└ (/home/runner/work/zarf/zarf/src/internal/template/template.go:73)

  DEBUG   map[GIT_AUTH_PULL:185e3dd5b35d633d615354ee GIT_AUTH_PUSH:f1d8c60b3ec4f99466124dab HTPASSWD:zarf-push:xxxx\nzarf-pull:$2a$10$NYDmZc5aDnU9EwSTD2SEz.7rWzO4xwAKgy0QPqD58nynmtrl5pTSu NODEPORT:31999 REGISTRY:127.0.0.1:31999 REGISTRY_AUTH_PULL:f278dfc43affbee63753e364d86c70f8aab81ee51f45c02b REGISTRY_AUTH_PUSH:6009132b6c664291254fbe76e305ed948defad0f7aafe9de REGISTRY_SECRET:xxx SEED_REGISTRY:127.0.0.1:31917 STORAGE_CLASS:]
└ (/home/runner/work/zarf/zarf/src/internal/template/template.go:112)
  DEBUG   [{/var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml # Source: docker-registry/templates/secret.yaml                                                                                            
          apiVersion: v1
          kind: Secret
          metadata:
            name: zarf-docker-registry-secret
            namespace: zarf
            labels:
              app: docker-registry
              chart: docker-registry-2.1.0-zarf
              heritage: Helm
              release: zarf-docker-registry
          type: Opaque
          data:
            validateSecretValue: "xxxxx"
            configData: "xxxxx"
            htpasswd: xxxx/T/zarf-1393582458/chart.yaml # Source: docker-registry/templates/service.yaml
          apiVersion: v1
          kind: Service
          metadata:
            name: zarf-docker-registry
            namespace: zarf
            labels:
              app: docker-registry
              chart: docker-registry-2.1.0-zarf
              release: zarf-docker-registry
              heritage: Helm
          spec:
            type: NodePort
            ports:
              - port: 5000
                protocol: TCP
                name: http-5000
                targetPort: 5000
                nodePort: 31999
            selector:
              app: docker-registry
              release: zarf-docker-registry 0xc002547ad0} {/var/folders/bd/tlpzcs0s0dvgv7b89l96fn3r0000gn/T/zarf-1393582458/chart.yaml # Source: docker-registry/templates/deployment.yaml
          apiVersion: apps/v1
          kind: Deployment
          metadata:
            name: zarf-docker-registry
            namespace: zarf
            labels:
              app: docker-registry
              chart: docker-registry-2.1.0-zarf
              release: zarf-docker-registry
              heritage: Helm
          spec:
            selector:
              matchLabels:
                app: docker-registry
                release: zarf-docker-registry
            replicas: 1
            minReadySeconds: 5
            template:
              metadata:
                labels:
                  app: docker-registry
                  release: zarf-docker-registry
                annotations:
                  checksum/secret: xxxxx
              spec:
                imagePullSecrets:
                  - name: private-registry
                securityContext:
                  fsGroup: 1000
                  runAsUser: 1000
                containers:
                  - name: docker-registry
                    image: "127.0.0.1:31917/library/registry:2.7.1"
                    imagePullPolicy: IfNotPresent
                    command:
                    - /bin/registry
                    - serve
                    - /etc/docker/registry/config.yml
                    ports:
                      - containerPort: 5000
                    livenessProbe:
                      httpGet:
                        path: /
                        port: 5000
                    readinessProbe:
                      httpGet:
                        path: /
                        port: 5000
                    resources:
                      limits:
                        cpu: "3"
                        memory: 2Gi
                      requests:
                        cpu: 500m
                        memory: 256Mi
                    env:
                      - name: REGISTRY_AUTH
                        value: "htpasswd"
                      - name: REGISTRY_AUTH_HTPASSWD_REALM
                        value: "Registry Realm"
                      - name: REGISTRY_AUTH_HTPASSWD_PATH
                        value: "/etc/docker/registry/htpasswd"
                      - name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
                        value: "/var/lib/registry"
                    volumeMounts:
                      - name: data
                        mountPath: /var/lib/registry/
                      - name: config
                        mountPath: "/etc/docker/registry"
                volumes:
                  - name: config
                    secret:
                      secretName: zarf-docker-registry-secret
                      items:
                      - key: configData
                        path: config.yml
                      - key: htpasswd
                        path: htpasswd
                  - name: data
                    emptyDir: {} 0xc002547f80}]
└ (/home/runner/work/zarf/zarf/src/internal/helm/post-render.go:73)
  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.GenerateRegistryPullCreds(zarf, private-registry)                                                                                                                                                                    
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:55)

  DEBUG   k8s.GenerateSecret(zarf, private-registry)                                                                                                                                                                               
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:35)

  DEBUG   config.GetSecret(registry-pull)                                                                                                                                                                                          
└ (/home/runner/work/zarf/zarf/src/config/secret.go:38)

  DEBUG   k8s.getSecret(zarf, private-registry)                                                                                                                                                                                    
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:29)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.ReplaceSecret(&Secret{ObjectMeta:{private-registry  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{.dockerconfigjson: [123 34 97 117 116 104 115 34 58 123 34 49 50 55 46 48 46 48 46 49 58 51 49 57 57 57 34 58 123 34 97 117 116 104 34 58 34 101 109 70 121 90 105 49 119 100 87 120 115 79 109 89 121 78 122 104 107 90 109 77 48 77 50 70 109 90 109 74 108 90 84 89 122 78 122 85 122 90 84 77 50 78 71 81 52 78 109 77 51 77 71 89 52 89 87 70 105 79 68 70 108 90 84 85 120 90 106 81 49 89 122 65 121 89 103 61 61 34 125 125 125],},Type:kubernetes.io/dockerconfigjson,StringData:map[string]string{},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:115)

  DEBUG   k8s.CreateNamespace(zarf)                                                                                                                                                                                                
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:31)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   &Namespace{ObjectMeta:{zarf    2ef9c292-56d0-4c81-bdce-1b88116eccbb 169820 0 2022-07-07 08:06:58 -0500 CDT <nil> <nil> map[app.kubernetes.io/managed-by:zarf kubernetes.io/metadata.name:zarf] map[] [] []  [{zarf Update v1 2022-07-07 08:06:58 -0500 CDT FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{},"f:kubernetes.io/metadata.name":{}}}} }]},Spec:NamespaceSpec{Finalizers:[kubernetes],},Status:NamespaceStatus{Phase:Active,Conditions:[]NamespaceCondition{},},}
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:57)
  DEBUG   k8s.DeleteSecret(&Secret{ObjectMeta:{private-registry  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{.dockerconfigjson: [123 34 97 117 116 104 115 34 58 123 34 49 50 55 46 48 46 48 46 49 58 51 49 57 57 57 34 58 123 34 97 117 116 104 34 58 34 101 109 70 121 90 105 49 119 100 87 120 115 79 109 89 121 78 122 104 107 90 109 77 48 77 50 70 109 90 109 74 108 90 84 89 122 78 122 85 122 90 84 77 50 78 71 81 52 78 109 77 51 77 71 89 52 89 87 70 105 79 68 70 108 90 84 85 120 90 106 81 49 89 122 65 121 89 103 61 61 34 125 125 125],},Type:kubernetes.io/dockerconfigjson,StringData:map[string]string{},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:129)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.CreateSecret(&Secret{ObjectMeta:{private-registry  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{.dockerconfigjson: [123 34 97 117 116 104 115 34 58 123 34 49 50 55 46 48 46 48 46 49 58 51 49 57 57 57 34 58 123 34 97 117 116 104 34 58 34 101 109 70 121 90 105 49 119 100 87 120 115 79 109 89 121 78 122 104 107 90 109 77 48 77 50 70 109 90 109 74 108 90 84 89 122 78 122 85 122 90 84 77 50 78 71 81 52 78 109 77 51 77 71 89 52 89 87 70 105 79 68 70 108 90 84 85 120 90 106 81 49 89 122 65 121 89 103 61 61 34 125 125 125],},Type:kubernetes.io/dockerconfigjson,StringData:map[string]string{},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:143)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.GenerateSecret(zarf, private-git-server)                                                                                                                                                                             
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:35)

  DEBUG   config.GetSecret(git-pull)                                                                                                                                                                                               
└ (/home/runner/work/zarf/zarf/src/config/secret.go:38)

  DEBUG   k8s.ReplaceSecret(&Secret{ObjectMeta:{private-git-server  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{},Type:Opaque,StringData:map[string]string{password: xxxxx,username: zarf-git-read-user,},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:115)

  DEBUG   k8s.CreateNamespace(zarf)                                                                                                                                                                                                
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:31)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   &Namespace{ObjectMeta:{zarf    2ef9c292-56d0-4c81-bdce-1b88116eccbb 169820 0 2022-07-07 08:06:58 -0500 CDT <nil> <nil> map[app.kubernetes.io/managed-by:zarf kubernetes.io/metadata.name:zarf] map[] [] []  [{zarf Update v1 2022-07-07 08:06:58 -0500 CDT FieldsV1 {"f:metadata":{"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{},"f:kubernetes.io/metadata.name":{}}}} }]},Spec:NamespaceSpec{Finalizers:[kubernetes],},Status:NamespaceStatus{Phase:Active,Conditions:[]NamespaceCondition{},},}
└ (/home/runner/work/zarf/zarf/src/internal/k8s/namespace.go:57)
  DEBUG   k8s.DeleteSecret(&Secret{ObjectMeta:{private-git-server  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{},Type:Opaque,StringData:map[string]string{password: xxxxx,username: zarf-git-read-user,},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:129)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  DEBUG   k8s.CreateSecret(&Secret{ObjectMeta:{private-git-server  zarf    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:zarf] map[] [] []  []},Data:map[string][]byte{},Type:Opaque,StringData:map[string]string{password: xxxx,username: zarf-git-read-user,},Immutable:nil,})
└ (/home/runner/work/zarf/zarf/src/internal/k8s/secrets.go:143)

  DEBUG   k8s.getClientSet()                                                                                                                                                                                                       
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:158)
  DEBUG   k8s.getRestConfig()                                                                                                                                                                                                      
└ (/home/runner/work/zarf/zarf/src/internal/k8s/common.go:143)
  ⠋  Deployment is not ready: zarf/zarf-docker-registry. 0 out of 1 expected pods are ready (2m24s)    

erikschlegel avatar Jul 07 '22 13:07 erikschlegel

Thanks @erikschlegel this definitely looks like a CRI change on AKS we'll need to play with a bit, I'll spin up AKS again this weekend to see if we can reproduce it. Are you provisioning AKS with IaC or via the Azure web interface?

jeff-mccoy avatar Jul 08 '22 23:07 jeff-mccoy

I'm provisioning the cluster directly through the Azure Portal. I confirmed that I was able to successfully initialize zarf using K8 version 1.21. I suspect this is a containerd issue as it's configured slightly different on AKS version 1.22+. This PR maybe worth checking out https://github.com/Azure/AgentBaker/pull/1369

erikschlegel avatar Jul 09 '22 01:07 erikschlegel

@jeff-mccoy Any update on this?

JasonvanBrackel avatar Sep 15 '22 15:09 JasonvanBrackel

Hi @jeff-mccoy - AKS K8 version 1.21 is no longer available for deployment in the portal and now none of the AKS supported versions appear to work with zarf. Do you happen to have an update?

erikschlegel avatar Oct 06 '22 17:10 erikschlegel

Hello, I am experiencing the same issue. Here is my output from kubectl describe pods:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  48s                default-scheduler  Successfully assigned zarf/zarf-docker-registry-796648965f-2fjw5 to aks-demo-97972695-vmss000000
  Normal   BackOff    19s (x2 over 48s)  kubelet            Back-off pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     19s (x2 over 48s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    8s (x3 over 48s)   kubelet            Pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     8s (x3 over 48s)   kubelet            Failed to pull image "127.0.0.1:32380/library/registry:2.7.1": rpc error: code = Unknown desc = failed to pull and unpack image "1
27.0.0.1:32380/library/registry:2.7.1": failed to resolve reference "127.0.0.1:32380/library/registry:2.7.1": failed to do request: Head "https://127.0.0.1:32380/v2/library/registry/manifest
s/2.7.1": http: server gave HTTP response to HTTPS client
  Warning  Failed     8s (x3 over 48s)   kubelet            Error: ErrImagePull

version 21.2 in an Azure US Government AKS cluster

Uninstall4735 avatar Oct 31 '22 20:10 Uninstall4735

I can confirm this is still an issue.

Hello, I am experiencing the same issue. Here is my output from kubectl describe pods:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  48s                default-scheduler  Successfully assigned zarf/zarf-docker-registry-796648965f-2fjw5 to aks-demo-97972695-vmss000000
  Normal   BackOff    19s (x2 over 48s)  kubelet            Back-off pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     19s (x2 over 48s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    8s (x3 over 48s)   kubelet            Pulling image "127.0.0.1:32380/library/registry:2.7.1"
  Warning  Failed     8s (x3 over 48s)   kubelet            Failed to pull image "127.0.0.1:32380/library/registry:2.7.1": rpc error: code = Unknown desc = failed to pull and unpack image "1
27.0.0.1:32380/library/registry:2.7.1": failed to resolve reference "127.0.0.1:32380/library/registry:2.7.1": failed to do request: Head "https://127.0.0.1:32380/v2/library/registry/manifest
s/2.7.1": http: server gave HTTP response to HTTPS client
  Warning  Failed     8s (x3 over 48s)   kubelet            Error: ErrImagePull

version 21.2 in an Azure US Government AKS cluster

dsmithbauer avatar Nov 02 '22 19:11 dsmithbauer

Yeah they must be doing something special, containerd upstream still serves localhost on http and even tests for it, digging into this more this week: https://github.com/containerd/containerd/blob/main/pkg/cri/server/image_pull_test.go

jeff-mccoy avatar Nov 02 '22 21:11 jeff-mccoy

While we work on this, as a note on a potential work around, you can also use an external registry as described here: https://docs.zarf.dev/docs/user-guide/the-zarf-cli/cli-commands/zarf_init

Under the Integrations tab in the Azure console you can tie an Azure Container Registry to your cluster and then init it with something like this: zarf init --registry-push-password={PASSWORD} --registry-push-username={USERNAME} --registry-url={REGISTRY}.azurecr.io

(you can also specify a separate --registry-pull-password and --registry-pull-username and can load the username(s)/password(s) from a zarf-config.toml as described here: https://docs.zarf.dev/docs/user-guide/the-zarf-cli/cli-commands/zarf_prepare_generate-config#synopsis)

(also note you will need to be on v0.22.1 or higher)

Racer159 avatar Nov 02 '22 22:11 Racer159

Put some notes in a new issue after digging around a bit. Will try to test an older version of AKS later on tonight.

jeff-mccoy avatar Nov 03 '22 00:11 jeff-mccoy

Also tested on AKS 1.22.11 and seeing the same results.

jeff-mccoy avatar Nov 03 '22 02:11 jeff-mccoy

Added some new notes at https://github.com/Azure/AKS/issues/3303#issuecomment-1306328925.

Looks like a bug in containerd that was patched 2 weeks ago. In the interim, Acorn ran into this issue too and did what we've been trying to avoid (modify the containerd config). Root cause is the change @erikschlegel identified to allow containerd registry config overrides in AKS actually highlighted the underlying containerd issue.

jeff-mccoy avatar Nov 07 '22 22:11 jeff-mccoy

@cheruvu1 if you have any other data you'd like to drop on this issue, please leave it here. Thanks!

jeff-mccoy avatar Nov 07 '22 22:11 jeff-mccoy

hi folks, as a workaround (that includes patch by iceberg you can update containerd in your cluster.
Just apt update and upgrade the node. I do it through a DaemonSet (example):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: update-cluster
  labels:
    app: update-cluster
spec:
  selector:
    matchLabels:
      app: update-cluster
  template:
    metadata:
      labels:
        app: update-cluster
    spec:
      containers:
      - name: update-cluster
        image: alpine
        imagePullPolicy: IfNotPresent
        command:
          - nsenter
          - --target
          - "1"
          - --mount
          - --uts
          - --ipc
          - --net
          - --pid
          - --
          - sh
          - -c
          - |
            # apt update and upgrade 
            export DEBIAN_FRONTEND=noninteractive apt update && apt upgrade -y
            sleep infinity
        securityContext:
          privileged: true
      dnsPolicy: ClusterFirst
      hostPID: true

image

I was able to initialize zarf: image

I did also deploy the Big Bang into AKS, but had a "bump"

  1. gatekeeper has the label control-plane: controller-manager in their NS... preventing the hook to change the image ... after removing it, everything went smoothly. -> in rke2 didn't get the problem ... image

jsburckhardt avatar Nov 30 '22 05:11 jsburckhardt

Incurred this problem while attempting to initialize zarf on a Nutanix kubernetes cluster running v1.22.9 of kubernetes and 1.6.6 of containerd.

brandtkeller avatar Dec 13 '22 17:12 brandtkeller

encountered the same on EKS v1.24 using the v0.24.0-rc3 binary and init package image

ntwkninja avatar Jan 31 '23 18:01 ntwkninja

EKS v1.23 works without issue because it is still using docker vs. containerd

ntwkninja avatar Jan 31 '23 23:01 ntwkninja

Tracking EKS AMI containerd update: https://github.com/awslabs/amazon-eks-ami/issues/1162

jeff-mccoy avatar Feb 01 '23 03:02 jeff-mccoy

quick update, containerd was updated and I could deploy bigbang 1.48.0. image

jsburckhardt avatar Feb 24 '23 03:02 jsburckhardt

Tracking EKS AMI containerd update: awslabs/amazon-eks-ami#1162

Upstream issue has been closed and @brianrexrode has successfully tested zarf v0.25.2 with EKS V1.26

ntwkninja avatar Apr 14 '23 23:04 ntwkninja

I've run into this with containerd > 1.6.25. It seems the default behavior changed here https://github.com/containerd/containerd/pull/9300

I've been commenting out the following lines in the containerd config.toml to work around it:

# [plugins."io.containerd.grpc.v1.cri".registry]
# config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

AbrohamLincoln avatar Feb 07 '24 18:02 AbrohamLincoln

@AbrohamLincoln thanks for the note! We're looking at exploring other options too, and for others this will affect newer versions of containerd 1.7 (>=1.7.7) (and 2.0 if anyone is on the betas) as well.

https://github.com/containerd/containerd/pull/9299

Racer159 avatar Feb 07 '24 22:02 Racer159

Having this issue also with AKS k8s 1.28.3 version , any updates regarding this??

Jdavid77 avatar Mar 01 '24 16:03 Jdavid77

Seeing similar behavior on EKS 1.26 - 1.29 all of which are using containerd 1.7.11. Zarf init was previously (late January) working on EKS 1.26 with containerd 1.7.2. Confirmed that going back to EKS 1.23 with docker runtime 20.10.25 does work, but that version is out of service.

philiversen avatar Mar 04 '24 21:03 philiversen

Hi guys. Any plans here? Still having a problem with k8s 1.27+ versions with containerd 1.7.11.

Any recommendations from the community on how we can "tune" a containerd config to avoid this issue?

jnt2007 avatar Mar 15 '24 18:03 jnt2007

Hi guys. Any plans here? Still having a problem with k8s 1.27+ versions with containerd 1.7.11.

Any recommendations from the community on how we can "tune" a containerd config to avoid this issue?

Commenting out the containerd config lines mentioned in this post got things working again for me.

philiversen avatar Mar 18 '24 13:03 philiversen

Also ran into this issue on newer RKE2 versions. It seems linked back to this commit which introduced the config_path line. The containerd update to 1.7.7+ actually did not immediately cause the issue (RKE2 1.29.0 works and is on containerd 1.7.11), it was specifically the introduction of that containerd config line 👀.

Just for reference affected versions of k3s/rke2 appear to be 1.29.1+, 1.28.6+, and 1.27.10+. Definitely curious if there is anything to address this on the zarf side or if this should make its way into the docs as a recommended pre-req/setup for the cluster?

mjnagel avatar Mar 20 '24 22:03 mjnagel

All things coming around, we may need to look to a way to avoid the localhost/http behavior since containerd has introduced bugs multiple times for this in the past year or so. https://github.com/containerd/containerd/pull/9188

jeff-mccoy avatar Mar 21 '24 21:03 jeff-mccoy

A new issue has been opened against containerd to address this: https://github.com/containerd/containerd/issues/10014

lucasrod16 avatar Mar 28 '24 17:03 lucasrod16