containerd icon indicating copy to clipboard operation
containerd copied to clipboard

multi-snapshotter: containerd 2.0.x error unpacking image -- failed to get reader from content store

Open skaegi opened this issue 1 year ago • 9 comments

Description

We are using multiple snapshotters -- overlayfs, devmapper -- with containerd 2.0.2 in a k8s 1.30+ environment. containerd is configured with discard_unpacked_layers = false and we always set an io.containerd.cri.runtime-handler: your_handler annotation on our Pods. Our workload is semi-heavy CI/CD where we run many clusters where an individual cluster might contain 30 nodes at any one time and run ~30K pods per day. Our workload uses kata containers configured to use devmapper.

Generally everything works fine however every few days (maybe 1 in 10K pods) we get a kata pod (configured to use devmapper) stuck in a CreateContainerError. The error varies on image but is something like... Error: failed to create containerd container: error unpacking image: apply layer error for "icr.io/continuous-delivery/pipeline/some-image:1.0": failed to extract layer sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465: failed to get reader from content store: content digest sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059: not found

Getting on the node to investigate with ctr we see that indeed the content store does not contain the sha.

root@kube-worker-xyz:/# ctr -n k8s.io content ls | grep 7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
sha256:0633dbdebf81742c80eca13d88dc5641302c1ee80e5ff695979e7ef3d2cbea2e	7.059kB	16 seconds		containerd.io/distribution.source.icr.io=continuous-delivery/pipeline/ghactions-runner-base-image,containerd.io/gc.ref.content.config=sha256:a28da9f57279860b8ff3420721cfaeef04219ead61cbfc022e12889c1bc1ea9e,containerd.io/gc.ref.content.l.0=sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059,containerd.io/gc.ref.content.l.10=sha256:129bec55accc2177f6aa2598aeba15992ef38b887a2fe87a99508cee343e7db2,containerd.io/gc.ref.content.l.11=sha256:3ec04c5c3eb4fc10545323ce16f75f642e956e0b77beaab649e534fe75b86c64,containerd.io/gc.ref.content.l.12=sha256:e55dd4175ec4664363842aba128288f79ac9665df6ad60feec507875d0a55410,containerd.io/gc.ref.content.l.13=sha256:98a1a1940846154dd7666774021637886ac8efbf8ec262f11ff90e6f7464d864,containerd.io/gc.ref.content.l.14=sha256:782179e80df74b48e956042180943c0dd51f6a573a8bd3a5c6ff7b02cb9f125f,containerd.io/gc.ref.content.l.15=sha256:c6f84a29dd6a608bd1cc041723c2ceb5529cce4f465e98ed2f603ec872eab012,containerd.io/gc.ref.content.l.16=sha256:06d06fb779113e546badc73e49959c60d439f1fd00966ef552e7f9e5fb34e2c9,containerd.io/gc.ref.content.l.17=sha256:48c7061d9759773a27f6b5a033cf5b7ad974f9813c8e28c3dd4f184aa928f538,containerd.io/gc.ref.content.l.18=sha256:a0488b16758da5e76c06586e6c6e608b309f6c08841501b3616dd57f1b914072,containerd.io/gc.ref.content.l.19=sha256:be692fc3d8e40b6b4d9417da2ae6deb13c792aaf8c54b31eb7aa6d99919eb8c9,containerd.io/gc.ref.content.l.1=sha256:4d215632061b651b6e052190f09cd8395984e2ff5a7255cbd1750f41becfc75f,containerd.io/gc.ref.content.l.20=sha256:cff3627e9059eb0b4f90f78a769b61517c365590590a49fb189e44e89b9520d6,containerd.io/gc.ref.content.l.21=sha256:aa9b46a8d4ff17c27277e73ef294ce8e31ee209fd190b3ba5f6423d4747689ef,containerd.io/gc.ref.content.l.22=sha256:bcebf055b9c702996c64e4a5c3fd49f262505c7c14094505490d2168f44d369a,containerd.io/gc.ref.content.l.23=sha256:8e47c22bbfedbcb3dd9aeb86c4e036ded8b70d23f3dc7457f81a11524a934354,containerd.io/gc.ref.content.l.24=sha256:907c5def4e65bebd643a7483ff5d51663a85411965fccbef22b188b59d8815c8,containerd.io/gc.ref.content.l.25=sha256:58f0b819e1fa8bc36d8710e6d0d931339f696a16317a72b04c193ecde3bc3d4a,containerd.io/gc.ref.content.l.26=sha256:72525998702b0804a3027e953ab70a57896edcd7e240b5852667c8d3ff99d67d,containerd.io/gc.ref.content.l.27=sha256:eaa7599c21d7625fe349835849b06df32a830494f6c8cb87cb4b371574d842d2,containerd.io/gc.ref.content.l.28=sha256:33d7c0e10d0f59895937b55edfacfe1aa0f811cc911e81ecd9cd130ae6f8dac8,containerd.io/gc.ref.content.l.29=sha256:8fd2f10cabfb2da08e69bcf905b71c39e3c0b2949a2cecf22ec0124824e9ace0,containerd.io/gc.ref.content.l.2=sha256:582db33f3992e53e2081d3d18a50c635ba98ae2800d7ea490a535de3a528bac3,containerd.io/gc.ref.content.l.30=sha256:fed4f325a1c817ff75905a7b9bed332052de160e7bb355e71727d14e8437c23f,containerd.io/gc.ref.content.l.31=sha256:a39bd1c90fa55895f8e9e1f9a0b3276ef8dde00c219b0b75c972b212680f3d0f,containerd.io/gc.ref.content.l.3=sha256:275c6f54c1e9b274bfd7fe469b31185620ce914ed8da03e5d3f1e984cef405a7,containerd.io/gc.ref.content.l.4=sha256:7ba7152f52bc7a6ce0229cce157cfd686a15166b074de952d4ae467b0225a870,containerd.io/gc.ref.content.l.5=sha256:cf8bbe6a2c1c722a48b148861cb51d230d691e6864c97fcfec53160e2ae556b8,containerd.io/gc.ref.content.l.6=sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1,containerd.io/gc.ref.content.l.7=sha256:f16831111a8724eb18bb33c348c9a905a6a408e376f18b1b001229bd6ea0df2b,containerd.io/gc.ref.content.l.8=sha256:7cf7d229926b9216944f2778d2a7c5af72e350cc68f9ec69de953e4ca62b5d7c,containerd.io/gc.ref.content.l.9=sha256:54baed79e3470faf0a4d4c433bbd2c52430fb76bedb58b5fe75846e6c88fd274

e.g. the metadata but not the actual content which "is not present and should be...

sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059	30.44MB	13 seconds		containerd.io/distribution.source.icr.io=continuous-delivery/pipeline/ghactions-runner-base-image,containerd.io/uncompressed=sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465

Looking at the snapshotters we see...

root@kube-worker-xyz:/# ctr -n k8s.io snapshot --snapshotter=overlayfs ls | grep 2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465
sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465                                                                         Committed 
sha256:a924ffd449222692a0bc9603f62eebd7aa3fddd5d90da50a9463c707a9021125 sha256:2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465 Committed 

root@kube-worker-xyz:/# c snapshot --snapshotter=devmapper ls | grep 2573e0d8158209ed54ab25c87bcdcb00bd3d2539246960a3d592a1c599d70465
root@kube-worker-xyz:/#

Two things are strange here.

  1. Why was the content removed even though we are configured with discard_unpacked_layers = false
  2. Why was the layer unpacked to the default snapshotter. We have gone back and verified that in 100% of all cases the pods in question are setting the io.containerd.cri.runtime-handler: kata annotation where the kata handler is configured to use devmapper

We can recover this situation on the node by doing

ctr -n k8s.io i pull --snapshotter devmapper --local icr.io/continuous-delivery/pipeline/some-image:1.0

... but this is a pain to configure this type of recovery at scale.

We're also looking if a more generalized version of the recovery done in https://github.com/containerd/containerd/pull/10703 (PR coming if it works) might be worth integrating into a custom build in the short term but would definitely like to see a more official fix in a release. The situation to get in this state is weird and likely involves a race condition worth understand better however... this error really is recoverable and perhaps we should just log the problem but then re-pull when encountering the not found error.

Describe the results you received and expected

Result >> Actual: Pod stuck in CreateContainerError Expected: Pod recovers after encountering error

What version of containerd are you using?

containerd github.com/containerd/containerd/v2 v2.0.2 c507a0257ea6462fbd6f5ba4f5c74facb04021f4

skaegi avatar Feb 14 '25 18:02 skaegi

maybe content store is broken because of unexpected power off. but unpack happened after pull(don't have chance to reboot)

ningmingxiao avatar Feb 24 '25 04:02 ningmingxiao

Thishappening at scale over many 100s of nodes and many 10s of clusters -- not just a single isolated case. I agree that something weird is happening but I think a power off is not it. If I was to guess perhaps a GC is happening for an image that shares some layers and somehow the multi-snapshotters is making this more incorrect than it normally might be.

I don't want to create a PR here because people might adopt it and we just have a hack for now, but... we are currently using the following patch to mitigate...

diff --git a/client/image.go b/client/image.go
index 355bcba73..a93fa4929 100644
--- a/client/image.go
+++ b/client/image.go
@@ -335,7 +335,19 @@ func (i *image) Unpack(ctx context.Context, snapshotterName string, opts ...Unpa
 	for _, layer := range layers {
 		unpacked, err = rootfs.ApplyLayerWithOpts(ctx, layer, chain, sn, a, config.SnapshotOpts, config.ApplyOpts)
 		if err != nil {
-			return fmt.Errorf("apply layer error for %q: %w", i.Name(), err)
+			// check if error is due to missing content and if so repull and retry apply layer
+			if errdefs.IsNotFound(err) {
+				if _, err2 := i.client.Pull(ctx, i.Name()); err2 != nil {
+					// the repull failed -- for now be really aggressive and delete the manifest from the content store
+					// this does not recover the pod if the image is already in use in a different snapshotter
+					i.client.contentStore.Delete(ctx, i.Target().Digest)
+					return fmt.Errorf("removing image manifest after failed repull for %q: %w : %w", i.Name(), err2, err)
+				}
+				unpacked, err = rootfs.ApplyLayerWithOpts(ctx, layer, chain, sn, a, config.SnapshotOpts, config.ApplyOpts)
+			}
+			if err != nil {
+				return fmt.Errorf("apply layer error for %q: %w", i.Name(), err)
+			}
 		}
 
 		if unpacked {

If the content is not found we:

  1. try an unauthenticated client.Pull (as the CRI CreateContainer/Sandbox call does not have creds)
  2. if that fails -- likely because of an auth or rate-limiting problem but for whatever other reason we delete the image manifest with the hope that the next time round we will force a top-level CRI PullImage which is done with creds

Ugly as heck but... (1) works consistently for anonymous images as long as we're not rate limited (2) works less well and gets into trouble if the image is already in use with a different snapshotter but eventually seems to help.


The best fix here would definitely be to understand what happened to the content store ... GC or otherwise. With that said, I think it would still be very helpful if we could recover too. It might help if we had the creds passed in for the CRI CreateContainer/Sandbox calls if that is possible, or if we had a way to truly reset the CRI image cache to force a re-pull that would also be good.

skaegi avatar Feb 24 '25 19:02 skaegi

Hi, @skaegi. I'm Dosu, and I'm helping the containerd team manage their backlog. I'm marking this issue as stale.

Issue Summary:

  • Sporadic error in Kubernetes with containerd 2.0.2, affecting kata pods creation.
  • Error linked to missing content digest in the content store.
  • Suggested causes include power offs and potential garbage collection issues.
  • Temporary patch shared by you to repull images or delete manifests.
  • Need for better understanding and improvements in content store behavior.

Next Steps:

  • Please confirm if the issue persists in the latest version of containerd; comment to keep the discussion open.
  • If no updates are provided, the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Jun 25 '25 16:06 dosubot[bot]

@skaegi how about use multi-snapshotter in create pod? can you provide a sample demo yaml? thanks you.

lengrongfu avatar Jul 17 '25 09:07 lengrongfu

I am seeing this issue consistently with the following setup:

  • kata-containers 3.20.0
  • firecracker release v1.21.1
  • rke2 v1.32.8+rke2r1
  • ctr github.com/k3s-io/containerd v2.0.5-k3s2

I've set up devmapper as instructed in the kata-containers guide: https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-kata-containers-with-firecracker.md#configure-devmapper

This is my config-v3.toml.tmpl

{{ template "base" . }}

[plugins.'io.containerd.snapshotter.v1.devmapper']
  base_image_size = "10GB"
  discard_blocks = true
  pool_name = "devpool"
  root_path = "/data/containerd/devmapper"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-fc]
  runtime_type = "io.containerd.kata-fc.v2"
  snapshotter = 'devmapper'
  [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-fc.options]
    ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-fc.toml"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-qemu]
  runtime_type = "io.containerd.kata.v2"
  [plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.kata-qemu.options]

I can run images directly with ctr and the various plugins

testimg=docker.io/rockylinux/rockylinux:latest

sudo ctr run --rm $testimg rocky-default uname -a;
sudo ctr run --runtime io.containerd.kata.v2 --rm $testimg rocky-kata uname -a;
sudo ctr run --snapshotter devmapper --runtime io.containerd.run.kata-fc.v2 -t --rm $testimg test-me uname -a

but any image i try to run in rke2 using the runtimeclass i've established for kata-fc has the same problem described above

# this works ok
kubectl run quicktest --image=ubuntu:22.04 --restart=Never --command -- sleep 3600

# this fails
kubectl run quicktest --image=ubuntu:22.04 --restart=Never \
  --overrides='{"spec":{"runtimeClassName":"kata-fc"}}' \
  --command -- sleep 3600

# eventlog
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  34s               default-scheduler  Successfully assigned serving/quicktest to infra-dev-1
  Normal   Pulled     5s (x4 over 32s)  kubelet            Container image "ubuntu:22.04" already present on machine
  Warning  Failed     5s (x4 over 31s)  kubelet            Error: failed to create containerd container: error unpacking image: apply layer error for "docker.io/library/ubuntu:22.04": failed to extract layer sha256:90a2bf02e851326fc70d05470553ed33e578342d6e06bfa0cfaf331c4079b7e4: failed to get reader from content store: content digest sha256:a3be5d4ce40198dc77f17780f02720f55b1898a2368f701dd1619fc9f84aac86: not found

trjh avatar Aug 29 '25 22:08 trjh

Bumping this issue in the hopes that someone with the needed expertise will see it. I'd love to contribute a patch but I'm not familiar at all with how the containerd codebase is structured.

We're running into this consistently on GKE because the default containerd configuration starts off with discard_unpacked_layers=true. Attempting to change this setting to false and then setting up an additional snapshotter will cause subsequent re-pulls of an image (that was pulled and unpacked before the configuration changes were made) to fail with the error failed to get reader from content store. I believe this comment accurately describes what's going on.

We're trying to get Kata and Firecracker running as well, but I don't think the problem has anything to do with them - rather it's just a side effect of trying to use more than 1 snapshotter. The version of containerd we're using is github.com/containerd/containerd/v2 v2.0.4.m 1a43cb6a1035441f9aca8f5666a9b3ef9e70ab20.m.

We're able to work around this by applying @skaegi's patch and rebuilding from source, however this is obviously not ideal since we need to do this on every node running multiple snapshotters.

spectrogram avatar Oct 14 '25 05:10 spectrogram

Hi Folks, Similar issue:

kata-containers 3.21.0 cloud-hypervisor v48 containerd v2.0.5 kubernetes vanilla v1.32.6

we have to do ctr -n k8s.io i pull --snapshotter devmapper --local icr.io/continuous-delivery/pipeline/some-image:1.0 to solve it for each image.

parsa97 avatar Oct 25 '25 18:10 parsa97

can you try https://github.com/containerd/containerd/pull/11996

ningmingxiao avatar Oct 26 '25 03:10 ningmingxiao

Our use case is that we use a snapshot of a pre-warmed bottlerocket disk with most of our frequently used OCIs to start up new EKS nodes faster. We ran into this issue when we upgraded EKS to 1.33.

We were only including amd64 images in the pre-warmed snapshot even thought the clusters also had arm64 nodes. As soon as we started pulling the arm64 variant alongside the amd64 one when warming up the cache the error disappeared. I think this has to do with the way that ctr does the pulling and then containerd does the image validation. Instead of seeing that the arm64 variant was not present and pulling it, on containerd v2 it seems to just panick on the image decompression.

mamoit avatar Nov 14 '25 16:11 mamoit