criu
criu copied to clipboard
CRIU failed to find message queue mount point :Error (criu/files-reg.c:1690): Can't lookup mount=663
Description
We have 2 process , parent and the child .
These two process having 120+ threads , 120+message queues , 120+domain socksts and also shared memory of 6+GB
podman container checkpoint 09f4e990b10b
ERRO[0000] container is not destroyed
ERRO[0000] criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/dump.log
Error: /usr/bin/runc checkpoint --image-path /var/lib/containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata 09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99 failed: exit status 1
Steps to reproduce the issue:
- Create 2 process , child and parent
- Use IPC message quues, domain sockts and shared memory
- Try to create checkpoint with above command
Describe the results you received:
podman container checkpoint 09f4e990b10b
ERRO[0000] container is not destroyed
ERRO[0000] criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/dump.log
Error: /usr/bin/runc checkpoint --image-path /var/lib/containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata 09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99 failed: exit status 1
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
CRIU logs and information:
CRIU full dump/restore logs:
(
[CRIU dump file.docx](https://github.com/checkpoint-restore/criu/files/8332675/CRIU.dump.file.docx)
)
Output of `criu --version`:
Version: 3.15
Output of `criu check --all`:
(criu check --all
Warn (criu/cr-check.c:1230): clone3() with set_tid not supported
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.)
Additional environment details: cat mount_info 663 618 0:52 / / rw,nodev,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c488,c846",lowerdir=/var/lib/containers/storage/overlay/l/IYSTN6BTEK5ASTGT4UB5PHNOUZ:/var/lib/containers/storage/overlay/l/2TUP2ORAQFXTOS3HLBTNA2FS3P:/var/lib/containers/storage/overlay/l/QUJOUR2AZRHN25EQWRZ4EBBWNV:/var/lib/containers/storage/overlay/l/DILRB2PQKHE2H5TOVOW2REH2R4:/var/lib/containers/storage/overlay/l/HIVG6CZLAFRS3B5X2RGTGEIQYP:/var/lib/containers/storage/overlay/l/RLI57QVDIHEQLTLGLVODRAAWCP:/var/lib/containers/storage/overlay/l/QLYOPKO2VJEM7634SIPDHUTDQD:/var/lib/containers/storage/overlay/l/2LQMNZ3R32OGZ3N3CJCEP7VS2W:/var/lib/containers/storage/overlay/l/XBZRMXHNWIINXGPJRSI2PLNNKL:/var/lib/containers/storage/overlay/l/T2VM3D3IC7CEJVOJHVZGDO2GKG:/var/lib/containers/storage/overlay/l/T3TRKQPLLN3CS6KLHQEQUL5I24:/var/lib/containers/storage/overlay/l/NRY4PPD2TLUSDEQNQOM4Z3SXTP:/var/lib/containers/storage/overlay/l/YPRX7WYEYHRRLFWG7XGO2MVU2J,upperdir=/var/lib/containers/storage/overlay/2c0b0586ad28ede376a1e6d2ec76288319cfbc672b90c0cc9f9600d8daff794f/diff,workdir=/var/lib/containers/storage/overlay/2c0b0586ad28ede376a1e6d2ec76288319cfbc672b90c0cc9f9600d8daff794f/work,metacopy=on 664 663 0:55 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 665 663 0:56 / /dev rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c488,c846",size=65536k,mode=755 666 663 0:21 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs rw,seclabel 667 665 0:57 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,context="system_u:object_r:container_file_t:s0:c488,c846",gid=5,mode=620,ptmxmode=666 668 665 0:54 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw,seclabel 669 663 0:24 /containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/run/secrets /run/secrets rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 670 663 0:24 /containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/resolv.conf /etc/resolv.conf rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 671 663 0:24 /containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/hosts /etc/hosts rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 672 665 0:51 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs shm rw,context="system_u:object_r:container_file_t:s0:c488,c846",size=64000k 673 663 0:24 /containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/hostname /etc/hostname rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 674 663 0:24 /containers/storage/overlay-containers/09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99/userdata/.containerenv /run/.containerenv rw,nosuid,nodev - tmpfs tmpfs rw,seclabel,mode=755 675 666 0:58 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c488,c846",mode=755 676 675 0:26 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/systemd ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 677 675 0:29 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/blkio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,blkio 678 675 0:30 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/cpuset ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpuset 679 675 0:31 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/net_cls,net_prio ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,net_cls,net_prio 680 675 0:32 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/cpu,cpuacct ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,cpu,cpuacct 681 675 0:33 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/freezer ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,freezer 682 675 0:34 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/pids ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,pids 683 675 0:35 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/hugetlb ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,hugetlb 684 675 0:36 / /sys/fs/cgroup/rdma ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,rdma 685 675 0:37 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/devices ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,devices 686 675 0:38 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/perf_event ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,perf_event 687 675 0:39 /machine.slice/libpod-09f4e990b10bb0b9f56d239a31e03b05344fc1b688b52afccd7bfcaa02c6ba99.scope /sys/fs/cgroup/memory ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,seclabel,memory 620 665 0:57 /0 /dev/console rw,nosuid,noexec,relatime - devpts devpts rw,context="system_u:object_r:container_file_t:s0:c488,c846",gid=5,mode=620,ptmxmode=666 621 664 0:55 /asound /proc/asound ro,nosuid,nodev,noexec,relatime - proc proc rw 622 664 0:55 /bus /proc/bus ro,nosuid,nodev,noexec,relatime - proc proc rw 623 664 0:55 /fs /proc/fs ro,nosuid,nodev,noexec,relatime - proc proc rw 624 664 0:55 /irq /proc/irq ro,nosuid,nodev,noexec,relatime - proc proc rw 625 664 0:55 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw 626 664 0:55 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw 627 664 0:59 / /proc/acpi ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c488,c846" 628 664 0:56 /null /proc/kcore rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c488,c846",size=65536k,mode=755 629 664 0:56 /null /proc/keys rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c488,c846",size=65536k,mode=755 630 664 0:56 /null /proc/timer_list rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c488,c846",size=65536k,mode=755 631 664 0:56 /null /proc/sched_debug rw,nosuid - tmpfs tmpfs rw,context="system_u:object_r:container_file_t:s0:c488,c846",size=65536k,mode=755 632 664 0:60 / /proc/scsi ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c488,c846" 633 666 0:61 / /sys/firmware ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c488,c846" 634 666 0:62 / /sys/fs/selinux ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c488,c846" 635 666 0:63 / /sys/dev/block ro,relatime - tmpfs tmpfs ro,context="system_u:object_r:container_file_t:s0:c488,c846"
/proc/11487/fdinfo/5 pos: 0 flags: 02000002 mnt_id: 661
As mentioned in chat please add the dump.log and config.json as well as the content of /proc/PID/fdinfo/5 from the PID that failed dumping and /proc/PID/mountinfo
As mentioned in chat please add the dump.log and config.json as well as the content of /proc/PID/fdinfo/5 from the PID that failed dumping and /proc/PID/mountinfo
Attached all those information
Can you also do a grep "^663\>" /proc/*/mountinfo and it would help to see config.json of the container you are trying to checkpoint.
What is the process with the PID 122497 in your container?
Thanks. This time it failed with Can't lookup mount=661 for fd=5 path=/ICCAppToDBus (deleted). You would need to grep for 661 and not 663.
Can you maybe also share a podman inspect of the container you are trying to dump?
Not sure what is happening here. I am confused where the mount is coming from.
@Snorch do you have any ideas how to further figures this out. It seems FD 5 from the process is pointing to a no longer existing mount.
@NKHULLAHALLI can you give some more details how /ICCAppToDBus is created? Do you have a small reproducer?
ICCAppToDBus
we have created this message queue while bringing the protocol . But surprised that mount is no more existed. let me repro
created ipc=private so it may not be known by host
In initial report You have error (from docx file):
(00.263763) 121980 fdinfo 5: pos: 0 flags: 2/0x1
(00.263834) Error (criu/files-reg.c:1690): Can't lookup mount=663 for fd=5 path=/ICCAppToDBus (deleted)
You show mountinfo (likely from container as root mount is overlay):
Additional environment details:
cat mount_info
663 618 0:52 / / rw,nodev,relatime - overlay overlay rw,context="system_u:object_r:container_file_t:s0:c488,c846",lowerdir=/var/lib/containers/storage/overlay/l/IYSTN6BTEK5ASTGT4UB5PHNOUZ:/var/lib/containers/storage/overlay/l/2TUP2ORAQFXTOS3HLBTNA2FS3P:/var/lib/containers/storage/overlay/l/QUJOUR2AZRHN25EQWRZ4EBBWNV:/var/lib/containers/storage/overlay/l/DILRB2PQKHE2H5TOVOW2REH2R4:/var/lib/containers/storage/overlay/l/HIVG6CZLAFRS3B5X2RGTGEIQYP:/var/lib/containers/storage/overlay/l/RLI57QVDIHEQLTLGLVODRAAWCP:/var/lib/containers/storage/overlay/l/QLYOPKO2VJEM7634SIPDHUTDQD:/var/lib/containers/storage/overlay/l/2LQMNZ3R32OGZ3N3CJCEP7VS2W:/var/lib/containers/storage/overlay/l/XBZRMXHNWIINXGPJRSI2PLNNKL:/var/lib/containers/storage/overlay/l/T2VM3D3IC7CEJVOJHVZGDO2GKG:/var/lib/containers/storage/overlay/l/T3TRKQPLLN3CS6KLHQEQUL5I24:/var/lib/containers/storage/overlay/l/NRY4PPD2TLUSDEQNQOM4Z3SXTP:/var/lib/containers/storage/overlay/l/YPRX7WYEYHRRLFWG7XGO2MVU2J,upperdir=/var/lib/containers/storage/overlay/2c0b0586ad28ede376a1e6d2ec76288319cfbc672b90c0cc9f9600d8daff794f/diff,workdir=/var/lib/containers/storage/overlay/2c0b0586ad28ede376a1e6d2ec76288319cfbc672b90c0cc9f9600d8daff794f/work,metacopy=on
This means that fd 5 of process 121980 was opened from root container root mount. And that is super strange that criu can't find it in mountinfo...
So let's see what criu sees (grep " / @ ./ "):
(00.119206) type overlay source overlay mnt_id 665 s_dev 0x34 / @ ./ flags 0x200004 options context="system_u:object_r:container_file_t:s0:c765,c857",lowerdir=/var/lib/containers/storage/overlay/l/XNVYJAUTT3L3XM4TZCLHKODFZH:/var/lib/containers/storage/overlay/l/E5IW53GLJQI2LZYCRTXGJXZUGK:/var/lib/containers/storage/overlay/l/HPR4SV5GYWCS6NHWVE54W2EKSB:/var/lib/containers/storage/overlay/l/44WUZRJ2PQA7RHECOKLLJXHCDJ:/var/lib/containers/storage/overlay/l/EQXWI7BUSY7EIMRANA3HQYIE36:/var/lib/containers/storage/overlay/l/QJKXIG3VHDK35MR2TAQNIFXKSS:/var/lib/containers/storage/overlay/l/BHKHWOV77NBRWXENU5CWAN7TJJ:/var/lib/containers/storage/overlay/l/EZAL54ZFKRB26QHJGBX4A6TCLG:/var/lib/containers/storage/overlay/l/KFDXN2KDXDN2ZIQI73KHICMQSS:/var/lib/containers/storage/overlay/l/AWHBBKLJF6RTDMYGDCIEI5ORDZ:/var/lib/containers/storage/overlay/l/O4WP3LWEBEZY6OX5IXEZXCOVMO:/var/lib/containers/storage/overlay/l/4DCULCO5HCELHRWTGBA4O67R3X:/var/lib/containers/storage/overlay/l/DJMG23JKLNYGD2EUMYL7CVGRIB,upperdir=/var/lib/containers/storage/overlay/375b5f110e8a07854f407130ba81e65d9be403bd90e22ad8f9f9154c696da5b5/diff,workdir=/var/lib/containers/storage/overlay/375b5f110e8a07854f407130ba81e65d9be403bd90e22ad8f9f9154c696da5b5/work,metacopy=on
So it looks like you show mountinfo from one mntns and dump other mntns and that's the trick. In case mount namespace and the mount from which the file was opened does not belong to one of the dumped processes but to some other process it should be handled as external. (see https://criu.org/External_bind_mounts)
So in second case you can't find 661 in mountinfo because (probably) that namespace from which it was originally opened died (all tasks finished).
You probably don't want to open files from mount namespaces which do not belong to a dumped container and can die at any moment.
A friendly reminder that this issue had no activity for 30 days.
As far as I can tell the file descriptors for POSIX message queues live on an invisible file system that is automatically mounted nowhere when the ipc namespace is created. That's why the mnt_id can't be found in /proc/*/mountinfo. But, correct me if I'm wrong, even if CRIU was able to match the file to a different mount of the mqueue filesystem, it would currently not know how to handle the files in there. These files contain the settings needed for the mq_notify call during restoration. There is currently no code for parsing the files, for fetching the messages from the queue, and for recreation of the message queues. System V message queues appear to be supported, though.