criu
criu copied to clipboard
docker checkpoint/restore is slow
Description
Checkpoint/restore inside docker is slow.
(apologies if this the wrong place to report, but even a closed bugreport about this would have saved me quite some time.)
Steps to reproduce the issue:
I made a program to allocate memory here, and tried to CP/Restore it both directly and running inside a docker container
dumping the program with a 5Gb heap running on Linux directly took about 5 seconds (1 Gb/sec) restoring it took about 1 second.
dumping the program when running in a docker container using docker checkpoint create took about 40 seconds; restoring the checkpoint took 20 seconds.
CRIU logs and information:
there seem to be no logs under
/var/lib/docker/containers/$containerID/checkpoints/$cpID
version info:
$ docker version
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:41:00 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.3.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.7
Git commit: 41ca978
Built: Fri Sep 20 11:41:00 2024
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.7.22
GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc:
Version: 1.1.14
GitCommit: v1.1.14-0-g2c9f560
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ criu --version
Version: 3.18
GitID: v3.18-320-gdfb56eed6
$ uname -a
Linux hanwen-flow 6.8.0-47-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 16:16:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
$ sudo criu check --all
sudo: mon_handle_sigchld: waitpid: No child processes
Looks good.
Sounds like you already made the necessary measurements without docker. So it indeed sounds rather like a docker problem. Although it is not directly obvious why it would be so much slower.
Have you tried it with Podman. Just as an additional data point.
@hanwen-flow I was able to replicate these results locally. It looks like the reason docker checkpoint create is very slow is because it uses containerd to create an OCI image. This image includes a tar archive that contains the CRIU images. Docker then extracts this data from the tar archive into the checkpoint directory before deleting the image. This causes the checkpoint data to be copied multiple times and results in slow performance.
Have you tried it with Podman.
Checkpoint/restore with Podman is significantly faster.
@rst0git thanks for the analysis. I will look into podman. However, we are running a SaaS company, and it is not clear if we can push this change onto our customers.
The main reason to try Podman is to see if it is a Docker or a CRIU problem. If it is just a Docker problem you can provide a patch to Docker and fix it there.
@adrianreber I believe this problem is related to the migration to v2 shim (https://github.com/moby/moby/pull/41546) and the implementation is similar to the CheckpointContainer function introduced with https://github.com/containerd/containerd/pull/6965. I am not sure if there is an easy way to fix it.
fyi, I've been toying with podman. While the CRIU part of it is plenty fast, the way the rootfs diff is handled seems clumsy and somewhat slow. I'll open a separate issue with podman.
https://github.com/containers/podman/issues/24826
A friendly reminder that this issue had no activity for 30 days.