multi-snapshotter: containerd 2.0.x error unpacking image -- failed to get reader from content store
Description
We are using multiple snapshotters -- overlayfs, devmapper -- with containerd 2.0.2 in a k8s 1.30+ environment. containerd is configured with discard_unpacked_layers = false and we always set an io.containerd.cri.runtime-handler: your_handler annotation on our Pods. Our workload is semi-heavy CI/CD where we run many clusters where an individual cluster might contain 30 nodes at any one time and run ~30K pods per day. Our workload uses kata containers configured to use devmapper.
Generally everything works fine however every few days (maybe 1 in 10K pods) we get a kata pod (configured to use devmapper) stuck in a CreateContainerError. The error varies on image but is something like...
Error: failed to create containerd container: error unpacking image: apply layer error for "icr.io/continuous-delivery/pipeline/some-image:1.0": failed to extract layer sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465: failed to get reader from content store: content digest sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059: not found
Getting on the node to investigate with ctr we see that indeed the content store does not contain the sha.
root@kube-worker-xyz:/# ctr -n k8s.io content ls | grep 7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
sha256:0633dbdebf81742c80eca13d88dc5641302c1ee80e5ff695979e7ef3d2cbea2e 7.059kB 16 seconds containerd.io/distribution.source.icr.io=continuous-delivery/pipeline/ghactions-runner-base-image,containerd.io/gc.ref.content.config=sha256:a28da9f57279860b8ff3420721cfaeef04219ead61cbfc022e12889c1bc1ea9e,containerd.io/gc.ref.content.l.0=sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059,containerd.io/gc.ref.content.l.10=sha256:129bec55accc2177f6aa2598aeba15992ef38b887a2fe87a99508cee343e7db2,containerd.io/gc.ref.content.l.11=sha256:3ec04c5c3eb4fc10545323ce16f75f642e956e0b77beaab649e534fe75b86c64,containerd.io/gc.ref.content.l.12=sha256:e55dd4175ec4664363842aba128288f79ac9665df6ad60feec507875d0a55410,containerd.io/gc.ref.content.l.13=sha256:98a1a1940846154dd7666774021637886ac8efbf8ec262f11ff90e6f7464d864,containerd.io/gc.ref.content.l.14=sha256:782179e80df74b48e956042180943c0dd51f6a573a8bd3a5c6ff7b02cb9f125f,containerd.io/gc.ref.content.l.15=sha256:c6f84a29dd6a608bd1cc041723c2ceb5529cce4f465e98ed2f603ec872eab012,containerd.io/gc.ref.content.l.16=sha256:06d06fb779113e546badc73e49959c60d439f1fd00966ef552e7f9e5fb34e2c9,containerd.io/gc.ref.content.l.17=sha256:48c7061d9759773a27f6b5a033cf5b7ad974f9813c8e28c3dd4f184aa928f538,containerd.io/gc.ref.content.l.18=sha256:a0488b16758da5e76c06586e6c6e608b309f6c08841501b3616dd57f1b914072,containerd.io/gc.ref.content.l.19=sha256:be692fc3d8e40b6b4d9417da2ae6deb13c792aaf8c54b31eb7aa6d99919eb8c9,containerd.io/gc.ref.content.l.1=sha256:4d215632061b651b6e052190f09cd8395984e2ff5a7255cbd1750f41becfc75f,containerd.io/gc.ref.content.l.20=sha256:cff3627e9059eb0b4f90f78a769b61517c365590590a49fb189e44e89b9520d6,containerd.io/gc.ref.content.l.21=sha256:aa9b46a8d4ff17c27277e73ef294ce8e31ee209fd190b3ba5f6423d4747689ef,containerd.io/gc.ref.content.l.22=sha256:bcebf055b9c702996c64e4a5c3fd49f262505c7c14094505490d2168f44d369a,containerd.io/gc.ref.content.l.23=sha256:8e47c22bbfedbcb3dd9aeb86c4e036ded8b70d23f3dc7457f81a11524a934354,containerd.io/gc.ref.content.l.24=sha256:907c5def4e65bebd643a7483ff5d51663a85411965fccbef22b188b59d8815c8,containerd.io/gc.ref.content.l.25=sha256:58f0b819e1fa8bc36d8710e6d0d931339f696a16317a72b04c193ecde3bc3d4a,containerd.io/gc.ref.content.l.26=sha256:72525998702b0804a3027e953ab70a57896edcd7e240b5852667c8d3ff99d67d,containerd.io/gc.ref.content.l.27=sha256:eaa7599c21d7625fe349835849b06df32a830494f6c8cb87cb4b371574d842d2,containerd.io/gc.ref.content.l.28=sha256:33d7c0e10d0f59895937b55edfacfe1aa0f811cc911e81ecd9cd130ae6f8dac8,containerd.io/gc.ref.content.l.29=sha256:8fd2f10cabfb2da08e69bcf905b71c39e3c0b2949a2cecf22ec0124824e9ace0,containerd.io/gc.ref.content.l.2=sha256:582db33f3992e53e2081d3d18a50c635ba98ae2800d7ea490a535de3a528bac3,containerd.io/gc.ref.content.l.30=sha256:fed4f325a1c817ff75905a7b9bed332052de160e7bb355e71727d14e8437c23f,containerd.io/gc.ref.content.l.31=sha256:a39bd1c90fa55895f8e9e1f9a0b3276ef8dde00c219b0b75c972b212680f3d0f,containerd.io/gc.ref.content.l.3=sha256:275c6f54c1e9b274bfd7fe469b31185620ce914ed8da03e5d3f1e984cef405a7,containerd.io/gc.ref.content.l.4=sha256:7ba7152f52bc7a6ce0229cce157cfd686a15166b074de952d4ae467b0225a870,containerd.io/gc.ref.content.l.5=sha256:cf8bbe6a2c1c722a48b148861cb51d230d691e6864c97fcfec53160e2ae556b8,containerd.io/gc.ref.content.l.6=sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1,containerd.io/gc.ref.content.l.7=sha256:f16831111a8724eb18bb33c348c9a905a6a408e376f18b1b001229bd6ea0df2b,containerd.io/gc.ref.content.l.8=sha256:7cf7d229926b9216944f2778d2a7c5af72e350cc68f9ec69de953e4ca62b5d7c,containerd.io/gc.ref.content.l.9=sha256:54baed79e3470faf0a4d4c433bbd2c52430fb76bedb58b5fe75846e6c88fd274
e.g. the metadata but not the actual content which "is not present and should be...
sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 30.44MB 13 seconds containerd.io/distribution.source.icr.io=continuous-delivery/pipeline/ghactions-runner-base-image,containerd.io/uncompressed=sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465
Looking at the snapshotters we see...
root@kube-worker-xyz:/# ctr -n k8s.io snapshot --snapshotter=overlayfs ls | grep 2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465
sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465 Committed
sha256:a924ffd449222692a0bc9603f62eebd7aa3fddd5d90da50a9463c707a9021125 sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465 Committed
root@kube-worker-xyz:/# c snapshot --snapshotter=devmapper ls | grep 2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465
root@kube-worker-xyz:/#
Two things are strange here.
- Why was the content removed even though we are configured with
discard_unpacked_layers = false - Why was the layer unpacked to the default snapshotter. We have gone back and verified that in 100% of all cases the pods in question are setting the
io.containerd.cri.runtime-handler: kataannotation where the kata handler is configured to use devmapper
We can recover this situation on the node by doing
ctr -n k8s.io i pull --snapshotter devmapper --local icr.io/continuous-delivery/pipeline/some-image:1.0
... but this is a pain to configure this type of recovery at scale.
We're also looking if a more generalized version of the recovery done in https://github.com/containerd/containerd/pull/10703 (PR coming if it works) might be worth integrating into a custom build in the short term but would definitely like to see a more official fix in a release. The situation to get in this state is weird and likely involves a race condition worth understand better however... this error really is recoverable and perhaps we should just log the problem but then re-pull when encountering the not found error.
Describe the results you received and expected
Result >> Actual: Pod stuck in CreateContainerError Expected: Pod recovers after encountering error
What version of containerd are you using?
containerd github.com/containerd/containerd/v2 v2.0.2 c507a0257ea6462fbd6f5ba4f5c74facb04021f4
maybe content store is broken because of unexpected power off. but unpack happened after pull(don't have chance to reboot)
Thishappening at scale over many 100s of nodes and many 10s of clusters -- not just a single isolated case. I agree that something weird is happening but I think a power off is not it. If I was to guess perhaps a GC is happening for an image that shares some layers and somehow the multi-snapshotters is making this more incorrect than it normally might be.
I don't want to create a PR here because people might adopt it and we just have a hack for now, but... we are currently using the following patch to mitigate...
diff --git a/client/image.go b/client/image.go
index 355bcba73..a93fa4929 100644
--- a/client/image.go
+++ b/client/image.go
@@ -335,7 +335,19 @@ func (i *image) Unpack(ctx context.Context, snapshotterName string, opts ...Unpa
for _, layer := range layers {
unpacked, err = rootfs.ApplyLayerWithOpts(ctx, layer, chain, sn, a, config.SnapshotOpts, config.ApplyOpts)
if err != nil {
- return fmt.Errorf("apply layer error for %q: %w", i.Name(), err)
+ // check if error is due to missing content and if so repull and retry apply layer
+ if errdefs.IsNotFound(err) {
+ if _, err2 := i.client.Pull(ctx, i.Name()); err2 != nil {
+ // the repull failed -- for now be really aggressive and delete the manifest from the content store
+ // this does not recover the pod if the image is already in use in a different snapshotter
+ i.client.contentStore.Delete(ctx, i.Target().Digest)
+ return fmt.Errorf("removing image manifest after failed repull for %q: %w : %w", i.Name(), err2, err)
+ }
+ unpacked, err = rootfs.ApplyLayerWithOpts(ctx, layer, chain, sn, a, config.SnapshotOpts, config.ApplyOpts)
+ }
+ if err != nil {
+ return fmt.Errorf("apply layer error for %q: %w", i.Name(), err)
+ }
}
if unpacked {
If the content is not found we:
- try an unauthenticated client.Pull (as the CRI CreateContainer/Sandbox call does not have creds)
- if that fails -- likely because of an auth or rate-limiting problem but for whatever other reason we delete the image manifest with the hope that the next time round we will force a top-level CRI PullImage which is done with creds
Ugly as heck but... (1) works consistently for anonymous images as long as we're not rate limited (2) works less well and gets into trouble if the image is already in use with a different snapshotter but eventually seems to help.
The best fix here would definitely be to understand what happened to the content store ... GC or otherwise. With that said, I think it would still be very helpful if we could recover too. It might help if we had the creds passed in for the CRI CreateContainer/Sandbox calls if that is possible, or if we had a way to truly reset the CRI image cache to force a re-pull that would also be good.
Hi, @skaegi. I'm Dosu, and I'm helping the containerd team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- Sporadic error in Kubernetes with containerd 2.0.2, affecting kata pods creation.
- Error linked to missing content digest in the content store.
- Suggested causes include power offs and potential garbage collection issues.
- Temporary patch shared by you to repull images or delete manifests.
- Need for better understanding and improvements in content store behavior.
Next Steps:
- Please confirm if the issue persists in the latest version of containerd; comment to keep the discussion open.
- If no updates are provided, the issue will be automatically closed in 7 days.
Thank you for your understanding and contribution!
@skaegi how about use multi-snapshotter in create pod? can you provide a sample demo yaml? thanks you.
I am seeing this issue consistently with the following setup:
- kata-containers 3.20.0
- firecracker release v1.21.1
- rke2 v1.32.8+rke2r1
- ctr github.com/k3s-io/containerd v2.0.5-k3s2
I've set up devmapper as instructed in the kata-containers guide: https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-kata-containers-with-firecracker.md#configure-devmapper
This is my config-v3.toml.tmpl
{{ template "base" . }}
[plugins.'io.containerd.snapshotter.v1.devmapper']
base_image_size = "10GB"
discard_blocks = true
pool_name = "devpool"
root_path = "/data/containerd/devmapper"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-fc]
runtime_type = "io.containerd.kata-fc.v2"
snapshotter = 'devmapper'
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-fc.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-fc.toml"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-qemu]
runtime_type = "io.containerd.kata.v2"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-qemu.options]
I can run images directly with ctr and the various plugins
testimg=docker.io/rockylinux/rockylinux:latest
sudo ctr run --rm $testimg rocky-default uname -a;
sudo ctr run --runtime io.containerd.kata.v2 --rm $testimg rocky-kata uname -a;
sudo ctr run --snapshotter devmapper --runtime io.containerd.run.kata-fc.v2 -t --rm $testimg test-me uname -a
but any image i try to run in rke2 using the runtimeclass i've established for kata-fc has the same problem described above
# this works ok
kubectl run quicktest --image=ubuntu:22.04 --restart=Never --command -- sleep 3600
# this fails
kubectl run quicktest --image=ubuntu:22.04 --restart=Never \
--overrides='{"spec":{"runtimeClassName":"kata-fc"}}' \
--command -- sleep 3600
# eventlog
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned serving/quicktest to infra-dev-1
Normal Pulled 5s (x4 over 32s) kubelet Container image "ubuntu:22.04" already present on machine
Warning Failed 5s (x4 over 31s) kubelet Error: failed to create containerd container: error unpacking image: apply layer error for "docker.io/library/ubuntu:22.04": failed to extract layer sha256:90a2bf02e851326fc70d05470553ed33e578342d6e06bfa0cfaf331c4079b7e4: failed to get reader from content store: content digest sha256:a3be5d4ce40198dc77f17780f02720f55b1898a2368f701dd1619fc9f84aac86: not found
Bumping this issue in the hopes that someone with the needed expertise will see it. I'd love to contribute a patch but I'm not familiar at all with how the containerd codebase is structured.
We're running into this consistently on GKE because the default containerd configuration starts off with discard_unpacked_layers=true. Attempting to change this setting to false and then setting up an additional snapshotter will cause subsequent re-pulls of an image (that was pulled and unpacked before the configuration changes were made) to fail with the error failed to get reader from content store. I believe this comment accurately describes what's going on.
We're trying to get Kata and Firecracker running as well, but I don't think the problem has anything to do with them - rather it's just a side effect of trying to use more than 1 snapshotter. The version of containerd we're using is github.com/containerd/containerd/v2 v2.0.4.m 1a43cb6a1035441f9aca8f5666a9b3ef9e70ab20.m.
We're able to work around this by applying @skaegi's patch and rebuilding from source, however this is obviously not ideal since we need to do this on every node running multiple snapshotters.
Hi Folks, Similar issue:
kata-containers 3.21.0 cloud-hypervisor v48 containerd v2.0.5 kubernetes vanilla v1.32.6
we have to do ctr -n k8s.io i pull --snapshotter devmapper --local icr.io/continuous-delivery/pipeline/some-image:1.0 to solve it for each image.
can you try https://github.com/containerd/containerd/pull/11996
Our use case is that we use a snapshot of a pre-warmed bottlerocket disk with most of our frequently used OCIs to start up new EKS nodes faster. We ran into this issue when we upgraded EKS to 1.33.
We were only including amd64 images in the pre-warmed snapshot even thought the clusters also had arm64 nodes.
As soon as we started pulling the arm64 variant alongside the amd64 one when warming up the cache the error disappeared.
I think this has to do with the way that ctr does the pulling and then containerd does the image validation.
Instead of seeing that the arm64 variant was not present and pulling it, on containerd v2 it seems to just panick on the image decompression.