moby icon indicating copy to clipboard operation
moby copied to clipboard

Docker start from checkpoint fails occasionally as content sha256 already exists

Open DivyanshuSaxena opened this issue 2 years ago • 6 comments

Description I'm running a large workload where I need to repeatedly take container checkpoints and then restart them. Occasionally the start from checkpoint fails with the following error:

{"message":"failed to upload checkpoint to containerd: commit failed: content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exists"}

I use the API call directly to start the container as follows:

http://localhost/v1.40/containers/cont/start?checkpoint=cp

When making the above API call, occasionally the start fails due to a message saying sha256 already exists. I'm wondering what could be the reason for this.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:42:53 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:49:35 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-147-generic
 Operating System: Ubuntu 18.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 62.79GiB
 Name: node-25.WWWWWWW.XXXXXXXX.YYYY.ZZZZZZZZ.us
 ID: YZ6O:NKOV:B2IH:TRVQ:FIUR:WLVV:ZHEP:P5CN:SKWB:YI7T:JP7J:2XC7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No swap limit support

DivyanshuSaxena avatar Oct 01 '21 02:10 DivyanshuSaxena

Error is emitted here; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L188-L190

And coming from this code https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L906-L908

Perhaps it's safe to ignore that error; I see we have some handling for that in other parts of the code that stores content in containerd, e.g.; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/daemon/images/store.go#L149-L155

thaJeztah avatar Oct 01 '21 09:10 thaJeztah

I'm wondering what could be the reason for this.

Hmm.. looking a bit more at the code; IIUC, the content is stored in containerd's metadata using the checkpoint-dir as reference; https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L172

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

I see the same function mentioned above also cleans up the checkpoint from containerd's content store in a defer(); https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L175-L176

Wondering if that's a problem if multiple containers try to start from the same checkpoint 🤔 (or if there's still some lease / reference counting that would prevent that from happening)

@cpuguy83 any thoughts?

thaJeztah avatar Oct 01 '21 09:10 thaJeztah

@thaJeztah thanks for the quick response!

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

I use different checkpoint directories for different containers, but the checkpoint name is the same for all containers (do you think that could be an issue?)

Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).

DivyanshuSaxena avatar Oct 01 '21 15:10 DivyanshuSaxena

Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).

I only gave it a cursory look, so not sure (it could be that it only passes the basedir). That said, looking at the error message again, and the error is mentioning a digest that already exists;

content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exist

So wondering now if the "(directory)name" is just a red-herring, and it's the content of the checkpoint that's the same (not sure what the chances are for that; perhaps "empty" state? Of course, the checksum could still be a checksum of the name 😅.

I guess to summarise; this needs someone to take a dive into what's happening 😂.

thaJeztah avatar Oct 01 '21 17:10 thaJeztah

We also run into this problem in our CI tests: https://github.com/checkpoint-restore/criu/pull/1567

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

It looks like a race condition in containerd.

rst0git avatar Aug 05 '22 21:08 rst0git

Wondering if that's a problem if multiple containers try to start from the same checkpoint thinking (or if there's still some lease / reference counting that would prevent that from happening)

The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.

From the following comment

https://github.com/moby/moby/blob/1192b468e9f20d3eb304bdfc69b83fb851020fe7/vendor/github.com/containerd/containerd/content/local/store.go#L143-L145

it looks like we might need to add a global lock in

https://github.com/moby/moby/blob/2773f81aa5e9e34733675a7aa7e4219391caccb0/libcontainerd/remote/client.go#L175-L176

rst0git avatar Aug 05 '22 22:08 rst0git

I'm running into this exact same issue when trying to follow https://criu.org/Docker.

$ docker run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
fa322f8bad73d3fc4ad4558aa73c4d9e1e744daa1eec392eef2465e659996b83

$ docker checkpoint create looper checkpoint1
checkpoint1

$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to create task for container: content digest 2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: not found: unknown

$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to upload checkpoint to containerd: commit failed: content sha256:2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: already exists
Output of docker version
$ docker version
Client: Docker Engine - Community
 Version:           25.0.2
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        29cf629
 Built:             Thu Feb  1 00:22:57 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          25.0.2
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       fce6e0c
  Built:            Thu Feb  1 00:22:57 2024
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Output of docker info
$ docker info
Client: Docker Engine - Community
 Version:    25.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.5
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 3
 Server Version: 25.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-92-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.61GiB
 Name: c10-03.sysnet.ucsd.edu
 ID: 30256552-1c1a-4307-9b2f-e4c8fff589cc
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.

I tried waiting for a while, but that didn't seem to particularly help in my case. I'd be happy to share more information if it helps debug.

I am looking to explore the support for live container migration and this has been a blocker for me. I would appreciate if someone can show a workaround or if someone is willing to guide me on the required fix which I can try taking up.

mayank-02 avatar Mar 03 '24 22:03 mayank-02

@mayank-02: This is fixed with: https://github.com/moby/moby/pull/47456

vvoland avatar Mar 04 '24 09:03 vvoland

The error still exists after the fix, unless I'm missing something: https://github.com/moby/moby/pull/47456#issuecomment-2063268363

Edit: I was missing that the backport only applied to v25.0.4 onwards

dannykopping avatar Apr 18 '24 08:04 dannykopping

Thanks for checking! Let me close the issue then.

vvoland avatar Apr 18 '24 08:04 vvoland