containerd
containerd copied to clipboard
Devmapper race condition when unpacking the same image in parallel
Description
When a new k8s node comes up and containerd's cache is empty we occasionally see problems when the same image is being pulled/unpacked by multiple pods.
Steps to reproduce the issue
Multiple pods with containers downloading the same image where the default or per runtime snapshotter is devmapper. Working on a good repeater...
Describe the results you received and expected
Failed to pull image
"icr.io/continuous-delivery/pipeline/pipeline-base-image:2.38@sha256:2f11f01c9710ec711a17a9269be04f584e173c89b27b116c27ec93ec31981c07":
rpc error: code = Unknown desc = failed to pull and unpack image
"icr.io/continuous-delivery/pipeline/pipeline-base-image@sha256:2f11f01c9710ec711a17a9269be04f584e173c89b27b116c27ec93ec31981c07":
failed to prepare extraction snapshot "extract-32526796-ihFq
sha256:164cbd4e3e41bde2eef711f50aba45c8f66b45b099365a19eda9062394904e09":
failed to suspend device "devpool-snap-36698": no such device or address
This particular image quite large but we have seen it now with a wide variety including smaller images like docker:dind. It seems like the image is getting unpacked multiple times but I'll spend more time investigating and try to create a good problem repeater. Eventually everything settles, gets cached and we no longer see problems but something definitely seems weird during the unpack.
What version of containerd are you using?
v1.7.12
Any other relevant information
No response
Show configuration if it is related to CRI plugin.
No response
sigh.. this was self inflicted by our own patch from https://github.com/containerd/containerd/pull/8878. We were not locking on the chainID before doing our content fetches. Closing.
Re-opening -- unfortunately we are still seeing this ... even when trying to protect the fetch using
u.lockSnChainID(ctx, chainID, unpack.SnapshotterKey)
Hi @skaegi would you mind share reproduce steps? I was thinking that it's not the same to https://github.com/containerd/containerd/issues/6793.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 7 days unless new comments are made or the stale label is removed.
This issue was closed because it has been stalled for 7 days with no activity.