moby
moby copied to clipboard
Docker start from checkpoint fails occasionally as content sha256 already exists
Description I'm running a large workload where I need to repeatedly take container checkpoints and then restart them. Occasionally the start from checkpoint fails with the following error:
{"message":"failed to upload checkpoint to containerd: commit failed: content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exists"}
I use the API call directly to start the container as follows:
http://localhost/v1.40/containers/cont/start?checkpoint=cp
When making the above API call, occasionally the start fails due to a message saying sha256 already exists. I'm wondering what could be the reason for this.
Output of docker version
:
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:42:53 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:49:35 2020
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: v1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
Output of docker info
:
Client:
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 21
Server Version: 19.03.12
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-147-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 20
Total Memory: 62.79GiB
Name: node-25.WWWWWWW.XXXXXXXX.YYYY.ZZZZZZZZ.us
ID: YZ6O:NKOV:B2IH:TRVQ:FIUR:WLVV:ZHEP:P5CN:SKWB:YI7T:JP7J:2XC7
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Error is emitted here; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L188-L190
And coming from this code https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L906-L908
Perhaps it's safe to ignore that error; I see we have some handling for that in other parts of the code that stores content in containerd, e.g.; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/daemon/images/store.go#L149-L155
I'm wondering what could be the reason for this.
Hmm.. looking a bit more at the code; IIUC, the content is stored in containerd's metadata using the checkpoint-dir as reference; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L172
If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.
I see the same function mentioned above also cleans up the checkpoint from containerd's content store in a defer()
; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L175-L176
Wondering if that's a problem if multiple containers try to start from the same checkpoint 🤔 (or if there's still some lease / reference counting that would prevent that from happening)
@cpuguy83 any thoughts?
@thaJeztah thanks for the quick response!
If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.
I use different checkpoint directories for different containers, but the checkpoint name is the same for all containers (do you think that could be an issue?)
Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).
Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).
I only gave it a cursory look, so not sure (it could be that it only passes the basedir). That said, looking at the error message again, and the error is mentioning a digest that already exists;
content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exist
So wondering now if the "(directory)name" is just a red-herring, and it's the content of the checkpoint that's the same (not sure what the chances are for that; perhaps "empty" state? Of course, the checksum could still be a checksum of the name 😅.
I guess to summarise; this needs someone to take a dive into what's happening 😂.
We also run into this problem in our CI tests: https://github.com/checkpoint-restore/criu/pull/1567
If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.
It looks like a race condition in containerd.
Wondering if that's a problem if multiple containers try to start from the same checkpoint thinking (or if there's still some lease / reference counting that would prevent that from happening)
The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.
From the following comment
https://github.com/moby/moby/blob/1192b468e9f20d3eb304bdfc69b83fb851020fe7/vendor/github.com/containerd/containerd/content/local/store.go#L143-L145
it looks like we might need to add a global lock in
https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L175-L176
I'm running into this exact same issue when trying to follow https://criu.org/Docker.
$ docker run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
fa322f8bad73d3fc4ad4558aa73c4d9e1e744daa1eec392eef2465e659996b83
$ docker checkpoint create looper checkpoint1
checkpoint1
$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to create task for container: content digest 2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: not found: unknown
$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to upload checkpoint to containerd: commit failed: content sha256:2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: already exists
Output of docker version
$ docker version
Client: Docker Engine - Community
Version: 25.0.2
API version: 1.44
Go version: go1.21.6
Git commit: 29cf629
Built: Thu Feb 1 00:22:57 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 25.0.2
API version: 1.44 (minimum version 1.24)
Go version: go1.21.6
Git commit: fce6e0c
Built: Thu Feb 1 00:22:57 2024
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.6.28
GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Output of docker info
$ docker info
Client: Docker Engine - Community
Version: 25.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.24.5
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 2
Running: 0
Paused: 0
Stopped: 2
Images: 3
Server Version: 25.0.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-92-generic
Operating System: Ubuntu 22.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.61GiB
Name: c10-03.sysnet.ucsd.edu
ID: 30256552-1c1a-4307-9b2f-e4c8fff589cc
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.
I tried waiting for a while, but that didn't seem to particularly help in my case. I'd be happy to share more information if it helps debug.
I am looking to explore the support for live container migration and this has been a blocker for me. I would appreciate if someone can show a workaround or if someone is willing to guide me on the required fix which I can try taking up.
@mayank-02: This is fixed with: https://github.com/moby/moby/pull/47456
The error still exists after the fix, unless I'm missing something: https://github.com/moby/moby/pull/47456#issuecomment-2063268363
Edit: I was missing that the backport only applied to v25.0.4 onwards
Thanks for checking! Let me close the issue then.