hub-feedback
hub-feedback copied to clipboard
Images suddenly not able to be pulled with "error pulling image configuration: unknown blob"
Problem description
We have seen random images that are hosted in docker hub suddenly become unpullable. The error received from the docker client/server when trying to pull these images is similar to:
docker pull rancher/rancher-agent:v2.5.0
v2.5.0: Pulling from rancher/rancher-agent
171857c49d0f: Already exists
419640447d26: Already exists
61e52f862619: Already exists
8337e8208979: Pulling fs layer
167454f0e957: Pulling fs layer
151cee340873: Pulling fs layer
b3d5a16b3068: Waiting
2fb5152e9c94: Waiting
b4ba41728bf5: Waiting
0f7d6bdd0d2a: Waiting
error pulling image configuration: unknown blob
We (as far as we are aware) are not making any changes to the manifest in question, and have seen the manifest simply go into this state on its own. This is proven by prior pull successes for the architecture in question. We have gone ahead and put a band-aid fix in (pushed over the v2.5.0
agent image with an rc9 image which now pulls successfully), but would like to be able to root cause why the agent image went into this state in the first place.
https://github.com/rancher/rancher/issues/29424
It seems that random image layers are going into a 404
status:
[root@cannon01 flannelshirt]# skopeo copy --all docker://docker.io/rancher/coreos-flannel:v0.13.0-rancher1 dir:./flannel
Getting image list signatures
Copying 2 of 2 images in list
Copying image sha256:eccf2e521bc631c77805ef1ad27c24de5524f3c3f5756c52ca1c2dd5baf5ec09 (1/2)
Getting image source signatures
Copying blob df20fa9351a1 done
Copying blob 954276d325df done
Copying blob 179d3681d6f8 done
Copying blob 3091ac99776e done
Copying blob e0d1d1f1d25e done
Copying blob 4ede9682957c done
Copying blob a02d32e47883 done
Copying config 0bfefe9f64 done
Writing manifest to image destination
Storing signatures
Copying image sha256:71c9d4dc9a4411af0bb3bbdb634f66cb0049c1f9188700d07ec527cf8773a583 (2/2)
Getting image source signatures
Copying blob b538f80385f9 done
FATA[0007] Error reading blob sha256:fbae16e68ba0804b31b2cde80ba9158737f05e0f0779e529991b0a5031fa142d: invalid status code from registry 404 (Not Found)
https://hub.docker.com/layers/rancher/coreos-flannel/v0.13.0-rancher1/images/sha256-eccf2e521bc631c77805ef1ad27c24de5524f3c3f5756c52ca1c2dd5baf5ec09?context=explore has suddenly failed as well
@justincormack Not sure if you are the correct person to ping for this, but this is starting to be widespread and seemingly random across our repos.
Hey @Oats87 , the Hub team is investigating. It's believed this is related to Hub's garbage collection system (unrelated to the new image retention policy rolling out next month). The system was disabled this morning while the team gets to the bottom of this.
As of now it seems a very small number of blobs were "soft" deleted, where they were hidden from users but not actually deleted. For affected images, re-pushing them should resolve the problem.
Since the GC system was disabled, you shouldn't see issues with any further images. Once the team gets further into the investigation, we should have more info on impact and remediation.
We also saw this issue starting from last week, where at least 3 our recently pushed images became corrupted. Upon our team member investigation it looks like manifest v1 was corrupted/deleted, while all layers and manifest v2 was fine.
Since the GC system was disabled, you shouldn't see issues with any further images. Once the team gets further into the investigation, we should have more info on impact and remediation. @binman-docker
Could you tell me please, that info would be published in this bug or in some other place?
Hi @ValeriiVozniuk , sorry about that. The team should be posting a root cause analysis once everything is squared away. As of now I believe they've identified at least one code path that could have led to this. For the time being, you can re-push the images and they should be ok.
cc @mikeparker @ddebroy @arkodg
Hi, I maybe missed it, but was any post mortem for this issue published?
My container have also this trouble (today):
docker pull nelsonsoftware/nelson-sio-cli:latest
5a81b172d58f: Waiting
error pulling image configuration: errors:
unauthorized: authentication required
unauthorized: authentication required
My container have also this trouble (today):
docker pull nelsonsoftware/nelson-sio-cli:latest
5a81b172d58f: Waiting error pulling image configuration: errors: unauthorized: authentication required unauthorized: authentication required
Hi Nelson that's a different error and we published today's issue on our status page - that was an issue with certificate rotation which meant login was down.
We have not published a post mortem yet to the original issue here because whilst we have found and fixed the core issue we are doing more due diligence around ensuring any registry inconsistencies can be immediately detected and flagged for the future and building this at scale is a big project, so please bear with us and thanks for your patience.
We are clearing up our old issues and your ticket has been open for 6 months with no activity. Remove stale label or comment or this will be closed in 15 days.
Not stale, still don't have a public postmortem on the outage. @mikeparker can you follow up?
We are clearing up our old issues and your ticket has been open for 6 months with no activity. Remove stale label or comment or this will be closed in 15 days.
Postmortem?
Not stale
Began observing this behaviour for our Docker images, can reproduce reliably and have been unable to find a solution.
seeing this on GCP
sassy. the internet is for discussion, i thought. however everywhere i go i get slapped in the face by someone who means well. take care.
why are you posting this here?
if you care to hear: i would say that because this is the SEO post that comes up directly for the error code we're seeing, it would be good to have somewhere with eyes (real ones, you might not get it) to see a bigger issue at play.
this is stale
if your github bot isn't smart enough to not open something just because someone made a comment then perhaps one might wonder why Microsoft bought up all the AI company shares in the world if they aren't going to build useful things with it.
@wayjake Is there any pattern you notice with this?
- Does it go away after a bit?
- Is it at a consistent time? (ie, top of every hour)
- Any specific images?