k8s-image-swapper icon indicating copy to clipboard operation
k8s-image-swapper copied to clipboard

Issue with tags removed from ECR

Open tcyran opened this issue 1 year ago • 0 comments

What we did: We introduced LifeCyclePolicy into ECR to avoid having cached really old images. We set this to keep 3 latest tags, which in effect has removed a lot of tags including old ones and currently used. After cleanup k8s-image-swapper still recognized image as existing in ECR and mutate pod to start with ECR cached image, which ends up with ImagePullBackOff

What is the issue: Seems that there is some cache for skopeo, which see image even if it not exists. After deleting/recreating image-swapper pod situation get backs to normal.

Steps to reproduce:

  1. Start deployment with nginx:1:14.2
  2. Wait until k8s-image-swapper will cache image
  3. Restart nginx deployment - it will be started with cached image
  4. Remove image tag from ECR
  5. Restart nginx deployment - it will fall into ImagePullBackOff

Logs:

2023-12-14T12:07:18+01:00 11:07AM DBG github.com/estahn/[email protected]/pkg/webhook/image_swapper.go:285 > jmespath search results filter="obj.metadata.namespace == 'kube-system'" results=false
2023-12-14T12:07:18+01:00 11:07AM TRC github.com/estahn/[email protected]/pkg/registry/ecr.go:239 > found in cache kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 ref=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 uid=fe091146-17ba-42af-863f-f937b757365d
2023-12-14T12:07:18+01:00 11:07AM DBG github.com/estahn/[email protected]/pkg/webhook/image_swapper.go:251 > set new container image image=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 uid=fe091146-17ba-42af-863f-f937b757365d
2023-12-14T12:07:18+01:00 11:07AM TRC github.com/estahn/[email protected]/pkg/registry/ecr.go:239 > found in cache kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 ref=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 source-image=docker.io/library/nginx:1.14.2 target-image=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 uid=fe091146-17ba-42af-863f-f937b757365d
2023-12-14T12:07:18+01:00 11:07AM TRC github.com/estahn/[email protected]/pkg/webhook/image_copier.go:71 > image copy aborted: image already present in target registry kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 source-image=docker.io/library/nginx:1.14.2 target-image=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 uid=fe091146-17ba-42af-863f-f937b757365d

Additional info: Prove that image-tag is missing

aws ecr list-images --repository-name docker.io/library/nginx --filter '{ "tagStatus": "TAGGED" }'
    "imageIds": [
        {
            "imageDigest": "sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36",
            "imageTag": "1.21.3"
        },
        {
            "imageDigest": "sha256:08bc36ad52474e528cc1ea3426b5e3f4bad8a130318e3140d6cfe29c8892c7ef",
            "imageTag": "latest"
        }
    ]
}

also running skopeo inspect from image-swapper pod:

skopeo inspect --retry-times 3 docker://000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 --creds $TOKEN
FATA[0000] Error parsing image name "docker://000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2": reading manifest 1.14.2 in 000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx: manifest unknown: Requested image not found 

W/A: Restart image-swapper Deployment

tcyran avatar Dec 14 '23 11:12 tcyran