container-retention-policy
container-retention-policy copied to clipboard
Manifest Unknown After Cleanup on Skipped Tag, Amd64 Arch only
My container retention job meets expectations, except that every 7th day when it cleans up my "app" container, I am unable to pull the amd64 image. Though, the arm64 image pulls fine. Seems like this cleanup job is deleting a tag which my protected tag depends on? Bizarre behavior, as the tags that are being deleted are totally unrelated and deleting one tag shouldnt affect another. Any thoughts or insight here?
Thanks!
I get this error in my kube cluster every 7th day:
Failed to pull image "ghcr.io/../app:development": rpc error: code = Unknown desc = manifest unknown
I cant replicate the error from m1 machine (arm arch) -- the pull is successful. From an amd64 machine, I am able to replicate the "manifest unknown" error.
docker pull ghcr.io/../app:development
development: Pulling from ../app
manifest unknown
My retention policy is set to every 7 days, and the "development" tag should be skipped. The tag that was cleaned up in the logs was a truncated hash.
name: Delete old unused GHCR container images
on:
schedule:
- cron: '0 0 * * *' # every day at midnight
workflow_dispatch:
jobs:
clean-ghcr:
name: Delete old unused GHCR container images
runs-on: ubuntu-latest
steps:
- name: Delete containers older than a week, ignore tags
uses: snok/container-retention-policy@v1
with:
image-names: app
cut-off: A week ago UTC
account-type: org
org-name: my-org
keep-at-least: 3
untagged-only: false
skip-tags: latest, v*, dev*, gamma, beta, 1*, 2*, 3*, 4*, 5*, 6*
token: ${{ secrets.TOKEN }}
That's less than ideal π The logs don't indicate that the development
image itself is deleted, right? Can't say that I've encountered anything like this myself, unfortunately.
@sondrelg thanks for your response.
No, the logs do not show that anything but the image with a sha tag have been deleted. Any ideas for debugging this?
The action is really just a few API calls to the Github API, so if you can I think the best thing would be to authenticate locally, then maybe replicate the calls manually.
See:
- Fetching packages
- Listing package versions
- And the main deletion logic function
Finally here is the Github API docs: https://docs.github.com/en/rest/packages#get-a-package-version-for-an-organization
If you find any issues, a PR would be more than welcome π
@sondrelg Thanks for the info. What prevents the retention policy from deleting images that are depended on by multi-platform tagged versions?
If I understand correctly, the gh api will return a list of versions of a particular package, these versions will include untagged images that are potentially named in the manifest list for a multi-platform image. The retention policy may skip over a named tag, but it may include (for deletion) an image thats named in its manifest list. For example
dev:sha:abc123 { <-- manifest list, dev tag and sha:abc123 image skipped for deletion
archA: sha:foo, <-- eligible for deletion?
archB: sha:bar
}
If the example above represented a multi-plat manifest list, it would be preserved because it's tagged with "dev", but what about sha:foo
and sha:bar
images?
I've never really used a multi-platform images, so it's very possible we need to add special handling for this case. If I understand you correctly, it sounds like taking manifest lists into consideration should be the default behavior. Currently no such behavior exists.
Do you have a real data example of what this looks like?
@sondrelg Start with this Dockerfile
FROM alpine
RUN mkdir foobar
Execute a multi platform build using Dockers Buildx builder:
docker buildx build --push --platform linux/arm64,linux/amd64 -t <YOUR_REPO_URL>/multi-arch-build .
Inspect the manifest
docker manifest inspect <YOUR_REPO_URL>/multi-arch-build
This will produce a result that looks like this
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 735,
"digest": "sha256:6619a5ea49cd7174ded29cf5f1c98c559be59edd862349fc3c6238eb6274d3f0",
"platform": {
"architecture": "arm64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 735,
"digest": "sha256:24c08606be10f8db18e7f463e80fd2dc55a411f10d7a0d0beceab4591e3a6441",
"platform": {
"architecture": "amd64",
"os": "linux"
}
}
]
}
Notice the manifests array that includes 2 objects, one per arch. Each arch has its own container image referenced by the digest
.
When we run this clean up job, we clean up those "child" images/digests because they are untagged. AFAIK, there is a simple solution to this. See this post (consider upvoting plz) https://github.com/docker/buildx/discussions/1301
Upvoted :+1: I won't be able to look at this in depth for a few days, but I'll do a deep dive as soon as I can, if still needed. Certainly seems like I have all the information I need. In the meantime, as mentioned, contributions are always welcome :slightly_smiling_face:
thanks @sondrelg I'm going to track this down with GHCR, and will contribute if possible.
@sondrelg Seems like github doesn't discriminate between a parent container or child container when using the Packages LIST API. What LIST fails to reveal is the graph/dependencies that exist behind the scenes in the container registry. Basically, to do a proper delete, the github api should be avoided, and the registry API should be used. See these api docs for what github is using behind the scenes to manage ghcr: https://github.com/distribution/distribution/blob/main/docs/spec/api.md#deleting-an-image
Sorry, I think I missed your last message. I saw the response in the buildx issue, and agree a switch to this API seems like the right choice :+1:
I'll be taking my holidays in a few days, so will have very limited capacity in the next 3 weeks. Are you free to work on this? If not, I guess we could create a new issue for this and get back to it when either one of us (or someone else) does have time :slightly_smiling_face:
@sondrelg I won't have the personal time to do this for a while. But would be good to keep this issue in the backlog!
Any news on this one. I just hit the same issue. We've disabled the second arch for the moment, but would like to use both in the future....
Haven't looked at this since October, mostly since it doesn't affect me personally yet. It will as soon as Github actions lets me build arm images on arm-runners π
Would you be interested in implementing a fix @Eddman?
A little question regarding the container registry API, it seems there is no API for listing all untagged manifests, right? So it still requires GitHub Packages API to list all the packages.
This can be fixed by explicitly excluding untagged images referred in manifests of tagged images similar to https://github.com/Chizkiyahu/delete-untagged-ghcr-action/blob/278ac5c5ae16914324ba447591af23312af6c075/clean_ghcr.py#L137-L138.
I see @corinz, the description in https://github.com/snok/container-retention-policy/issues/43#issuecomment-1233436362 is really helpful. After looking at this for a little bit, I think this should work as solution:
- name: Fetch SHAs for all associated multi-platform package versions
id: multi-arch-digests
run: |
foo=$(docker manifest inspect ghcr.io/foo | jq -r '.manifests.[] | .digest' | paste -s -d ', ' -)
bar=$(docker manifest inspect ghcr.io/bar | jq -r '.manifests.[] | .digest' | paste -s -d ', ' -)
echo "multi-arch-digests=$foo,$bar" >> $GITHUB_OUTPUT
- uses: snok/container-retention-policy
with:
...
skip-shas: ${{ steps.multi-arch-digests.outputs.multi-arch-digests }}
This would mean implementing a new input for SHAs to avoid deleting, but that seems OK.
I want to release a v2 of the action soon where running a (much) smaller docker container is one of the main things I want to accomplish. Bundling the docker CLI in a container would be a bit of a nuisance, so I think this solution would solve things nicely, while keeping complexity low. Does anyone see any problems with it?
The latest release adds a skip-shas
input argument, which can be used to protect against deleting multi-platform images. Please see the new section in the readme for details, and let me know if anything is unclear.
The migration guide for v3 is included in the release post π
If you run into any issues, please share them in the issue opened for tracking the v3 release βΊοΈ