k8s-image-swapper
k8s-image-swapper copied to clipboard
Issue with tags removed from ECR
What we did: We introduced LifeCyclePolicy into ECR to avoid having cached really old images. We set this to keep 3 latest tags, which in effect has removed a lot of tags including old ones and currently used. After cleanup k8s-image-swapper still recognized image as existing in ECR and mutate pod to start with ECR cached image, which ends up with ImagePullBackOff
What is the issue: Seems that there is some cache for skopeo, which see image even if it not exists. After deleting/recreating image-swapper pod situation get backs to normal.
Steps to reproduce:
- Start deployment with nginx:1:14.2
- Wait until k8s-image-swapper will cache image
- Restart nginx deployment - it will be started with cached image
- Remove image tag from ECR
- Restart nginx deployment - it will fall into ImagePullBackOff
Logs:
2023-12-14T12:07:18+01:00 11:07AM DBG github.com/estahn/[email protected]/pkg/webhook/image_swapper.go:285 > jmespath search results filter="obj.metadata.namespace == 'kube-system'" results=false
2023-12-14T12:07:18+01:00 11:07AM TRC github.com/estahn/[email protected]/pkg/registry/ecr.go:239 > found in cache kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 ref=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 uid=fe091146-17ba-42af-863f-f937b757365d
2023-12-14T12:07:18+01:00 11:07AM DBG github.com/estahn/[email protected]/pkg/webhook/image_swapper.go:251 > set new container image image=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 uid=fe091146-17ba-42af-863f-f937b757365d
2023-12-14T12:07:18+01:00 11:07AM TRC github.com/estahn/[email protected]/pkg/registry/ecr.go:239 > found in cache kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 ref=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 source-image=docker.io/library/nginx:1.14.2 target-image=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 uid=fe091146-17ba-42af-863f-f937b757365d
2023-12-14T12:07:18+01:00 11:07AM TRC github.com/estahn/[email protected]/pkg/webhook/image_copier.go:71 > image copy aborted: image already present in target registry kind="/v1, Kind=Pod" name= namespace=tcn-personal-1 source-image=docker.io/library/nginx:1.14.2 target-image=000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 uid=fe091146-17ba-42af-863f-f937b757365d
Additional info: Prove that image-tag is missing
aws ecr list-images --repository-name docker.io/library/nginx --filter '{ "tagStatus": "TAGGED" }'
"imageIds": [
{
"imageDigest": "sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36",
"imageTag": "1.21.3"
},
{
"imageDigest": "sha256:08bc36ad52474e528cc1ea3426b5e3f4bad8a130318e3140d6cfe29c8892c7ef",
"imageTag": "latest"
}
]
}
also running skopeo inspect from image-swapper pod:
skopeo inspect --retry-times 3 docker://000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2 --creds $TOKEN
FATA[0000] Error parsing image name "docker://000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx:1.14.2": reading manifest 1.14.2 in 000000000000.dkr.ecr.eu-west-1.amazonaws.com/docker.io/library/nginx: manifest unknown: Requested image not found
W/A: Restart image-swapper Deployment