criu Checkpoint of chrome fails: pagemap-cache: Can't read 9397's pagemap file: No such file or directory

Description

Taking checkpoint of chrome fails with the following error:

(01.256562) pagemap-cache: 9397: filling VMA 738000c4000-83d00000000 (1094712560K) [l:73800000000 h:73800200000]
(02.325156) Error (criu/pagemap-cache.c:209): pagemap-cache: Can't read 9397's pagemap file: No such file or directory
(02.325194) Error (criu/pagemap-cache.c:225): pagemap-cache: Failed to fill cache for 9397 (738000c4000-83d00000000)
(02.325245) page-pipe: Killing page pipe
(02.416491) ----------------------------------------
(02.416520) Error (criu/mem.c:672): Can't dump page with parasite

filling VMA 738000c4000-83d00000000 (1094712560K) - 1094712560K sounds too big?

Steps to reproduce the issue:

The dump happens inside a Kubernetes pod and the image is proprietary. I can create a new image if it turns out the problem is not much straight-forward and requires full reproduction to debug.

I used the following command to dump:

$ criu dump -t 9289 -D /checkpoint --tcp-established --tcp-close -v4 --log-file dump.log --root / --manage-cgroups=ignore --ghost-limit=500M

The process tree was like the following:

$ ps -aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0 1233680 8040 ?        Ssl  20:32   0:00 procman run --image-dir /checkpoint --port 27898 -- /opt/scripts/entrypoint.sh start
root        9289  0.0  0.0   4784  3440 ?        Ss   20:32   0:00 /bin/bash /opt/scripts/entrypoint.sh start
root        9295  0.3  0.1  53700 34560 ?        S    20:32   0:00 /usr/bin/python3 /usr/bin/websockify --web /usr/share/novnc 5800 localhost:5900
root        9297  0.5  0.2 235212 84596 ?        S    20:32   0:00 Xtigervnc -geometry 1288x804x24 -SendPrimary=0 -SecurityTypes None -rfbport 5900 -alwaysshared -useblacklist=0 -Log
root        9298 41.0  1.8 1262088 612648 ?      Sl   20:32   0:13 node ./src/index.js
root        9311  0.2  0.0  54196 27188 ?        S    20:32   0:00 /usr/bin/python3 /usr/bin/websockify --web /usr/share/novnc 5800 localhost:5900
root        9312  8.7  0.7 34369604 261672 ?     S<sl 20:32   0:02 /opt/google/chrome/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkSer
root        9314  0.0  0.0 33575872 3448 ?       Sl   20:32   0:00 /opt/google/chrome/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --databa
root        9316  0.0  0.0 33567660 1708 ?       Sl   20:32   0:00 /opt/google/chrome/chrome_crashpad_handler --no-periodic-tasks --monitor-self-annotation=ptype=crashpad-handler --d
root        9321  0.0  0.1 33916516 55164 ?      S    20:32   0:00 /opt/google/chrome/chrome --type=zygote --no-zygote-sandbox --no-sandbox --crashpad-handler-pid=9314 --enable-crash
root        9322  0.0  0.1 33916516 55944 ?      S    20:32   0:00 /opt/google/chrome/chrome --type=zygote --no-sandbox --crashpad-handler-pid=9314 --enable-crash-reporter=, --user-d
root        9341  2.0  0.4 34241296 134356 ?     Sl   20:32   0:00 /opt/google/chrome/chrome --type=gpu-process --no-sandbox --disable-dev-shm-usage --disable-breakpad --crashpad-han
root        9343  3.0  0.3 33938240 112528 ?     Sl   20:32   0:00 /opt/google/chrome/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-san
root        9344  0.0  0.1 33966992 47864 ?      Sl   20:32   0:00 /opt/google/chrome/chrome --type=utility --utility-sub-type=storage.mojom.StorageService --lang=en-US --service-san
root        9397 21.4  0.9 1186275764 327236 ?   R<l  20:32   0:05 /opt/google/chrome/chrome --type=renderer --crashpad-handler-pid=9314 --enable-crash-reporter=, --user-data-dir=/tm
root        9420  8.2  0.7 1186268920 232180 ?   S<l  20:32   0:01 /opt/google/chrome/chrome --type=renderer --crashpad-handler-pid=9314 --enable-crash-reporter=, --user-data-dir=/tm
root        9431  2.5  0.5 1186242008 179920 ?   S<l  20:32   0:00 /opt/google/chrome/chrome --type=renderer --crashpad-handler-pid=9314 --enable-crash-reporter=, --user-data-dir=/tm
root        9447  0.1  0.2 33891548 68764 ?      Sl   20:32   0:00 /opt/google/chrome/chrome --type=utility --utility-sub-type=audio.mojom.AudioService --lang=en-US --service-sandbox
root        9455  0.0  0.1 1186193528 55604 ?    S<l  20:32   0:00 /opt/google/chrome/chrome --type=renderer --crashpad-handler-pid=9314 --enable-crash-reporter=, --user-data-dir=/tm

crit commands like crit x . fds also fail because dump is not complete.

Describe the results you received:

Got error during dump:

(02.325156) Error (criu/pagemap-cache.c:209): pagemap-cache: Can't read 9397's pagemap file: No such file or directory

Describe the results you expected:

Expected success.

Additional information you deem important (e.g. issue happens only occasionally):

Happens consistently.

CRIU logs and information:

dump.log

Output of `criu --version`:

Version: 3.19 (gitid 0)

Output of `criu check --all`:

Warn  (criu/kerndat.c:1285): Can't keep kdat cache on non-tempfs
Error (criu/cr-check.c:1223): UFFD is not supported
Error (criu/cr-check.c:1223): UFFD is not supported
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

Additional environment details:

It's running inside a Kubernetes pod where container runtime is containerd and node arch is amd64.

Mar 14 '24 20:03 muvaf

@muvaf Any luck here? Running into the same issue

Mar 17 '24 02:03 lukejmann

@muvaf could you show /proc/pid/maps for the target process?

Mar 18 '24 16:03 avagin

I think it hits the MAX_RW_COUNT (0x7ffff000) limit. The length of the target vma is 0x104fff3c000. CRIU reads 8 bytes per page, so it is 0x827ff9e0 bytes.

Mar 18 '24 20:03 avagin

@muvaf could you try out https://github.com/avagin/criu/commit/9405da090c93ef100e3fb0c7da2646cdb9e27fc1? It should fix the problem.

By the way, for such large dummy mappings, the pagemap file interface works slowly. Recently, the new PAGEMAP_SCAN ioctl was merged into the mainline kernel, and its support was implemented in CRIU (https://github.com/checkpoint-restore/criu/pull/2292). With these changes, CRIU handles huge dummy mappings much faster.

Mar 18 '24 21:03 avagin

@avagin That patch did make the error go away and I was able to take the checkpoint. Thank you! However, I wasn't able to validate that the checkpoint is correctly taken. The restore command fails with the following even though --tcp-close is used:

criu restore --images-dir /checkpoint --tcp-established --file-locks --evasive-devices --tcp-close --manage-cgroups=ignore -v4 --log-file restore.log --inherit-fd fd[1]:pipe:[1687037] --inherit-fd fd[2]:pipe:[1687038] --external mnt[zoneinfo]:/usr/share/zoneinfo --external mnt[null]:/dev/null --external mnt[random]:/dev/random --external mnt[urandom]:/dev/urandom --external mnt[tty]:/dev/tty --external mnt[zero]:/dev/zero --external mnt[full]:/dev/full

Error (criu/sk-inet.c:1029): inet: Can't bind inet socket (id 778): Cannot assign requested address

restore.log

@lukejmann I needed to add --file-locks and make sure some of the folders Chrome creates under /tmp is available on the target as well. To build @avagin 's patch, you need to clone, run make docker-build and copy /criu/criu/criu file from that image, given the dependencies listed here are in place where you run the commands.

Mar 19 '24 03:03 muvaf

@muvaf Well, they are udp sockets :)

(00.289251)   9344: inet: 	Restore: family AF_INET    type SOCK_DGRAM     proto IPPROTO_UDP      port 56915 state TCP_ESTABLISHED  src_addr 10.140.3.135

It is a good question what should we do with them... @muvaf What behavior do you expect?

Mar 19 '24 18:03 avagin

@avagin Huh I didn't realize that. I think, for starters, having an --udp-close flag similar to TCP would unlock part of the use cases where the both sides of the socket are designed to be resilient to reconnections.

To go further may not be feasible due to the same issue TCP has in regards to change of IP addresses, so at least we'd give users an escape hatch if they really have to change the IP address.

Mar 20 '24 22:03 muvaf

@avagin FWIW, if you can give me a pointer, I can try to get a PR going to add the --udp-close flag.

Mar 30 '24 10:03 muvaf

@muvaf I am skeptical about the idea of "--udp-close." There is a significant difference between TCP and UDP. TCP is connection-oriented, and the situation where a connection is interrupted is entirely normal and must be handled in the code. UDP, on the other hand, is connectionless. Therefore, applications may be caught off guard if "send" or "recv" return errors.

You can try out the next patch to see how your workload will handle closed udp sockets after restore:

diff --git a/criu/sk-inet.c b/criu/sk-inet.c
index a6a767c73..eda08f971 100644
--- a/criu/sk-inet.c
+++ b/criu/sk-inet.c
@@ -901,6 +901,13 @@ static int open_inet_sk(struct file_desc *d, int *new_fd)
                goto done;
        }
 
+       if (ie->proto == IPPROTO_UDP) {
+               if (shutdown(fd, SHUT_RDWR) && errno != ENOTCONN) {
+                       pr_perror("Unable to shutdown the socket id %x ino %x", ii->ie->id, ii->ie->ino);
+               }
+               goto done2;
+       }
+
        if (ie->src_port) {
                if (inet_bind(sk, ii))
                        goto err;
@@ -952,7 +959,7 @@ done:
                        }
                }
        }
-
+done2:
        *new_fd = sk;
 
        return 1;

For connected UDP sockets, it might be a good idea to skip binding to the local address. When CRIU calls "connect" to restore the destination address and port, the socket will be bound to the source address and a "random" port. I believe this should work in many cases. Could you please try the next patch, which implements this behavior?

diff --git a/criu/sk-inet.c b/criu/sk-inet.c
index a6a767c73..9bb1d04d4 100644
--- a/criu/sk-inet.c
+++ b/criu/sk-inet.c
@@ -900,8 +900,7 @@ static int open_inet_sk(struct file_desc *d, int *new_fd)
 
                goto done;
        }
-
-       if (ie->src_port) {
+       if (ie->proto != IPPROTO_UDP && ie->src_port) {
                if (inet_bind(sk, ii))
                        goto err;
        }

Apr 03 '24 01:04 avagin

A friendly reminder that this issue had no activity for 30 days.

May 04 '24 00:05 github-actions[bot]