criu icon indicating copy to clipboard operation
criu copied to clipboard

mnt: Can't remove the directory /tmp/.criu.mntns.Dxd7XG : Device or resource busy

Open mrc1119 opened this issue 3 years ago • 2 comments

I'm looking forward to getting help from the best.

Description dump:

    ./criu dump --ghost-limit=500M -D /export/sine/dumpdata \
     -j -R -t $pid --tcp-close --skip-in-flight -v4 -o /export/sine/sinedump.log \
     --skip-mnt /dev/termination-log --external mnt[/var/log/mydb]:mydbmount \
     --external mnt[/opt/mydb/data]:mydbdata

dump log: https://github.com/mrc1119/criu-log/blob/main/dump-succ.log

restore:

        ./criu restore --manage-cgroup=ignore --ghost-limit=500M \
        -D /export/sine/dumpdata -j -d --tcp-close --root / \
        --external mnt[mydbmount]:/var/log/mydb \
        --external mnt[mydbdata]:/opt/mydb/data \
        -v4 -o /export/sine/sinerestore.log

restore log: https://github.com/mrc1119/criu-log/blob/main/restore-failed.log

Steps to reproduce the issue:

  1. Dump the container's init process on the host (outside the container)
  2. Restore the process in a container which network mode is host

Describe the results you received:

(00.393971) Error (criu/mount.c:3609): mnt: Can't remove the directory /tmp/.criu.mntns.Dxd7XG: Device or resource busy
(00.454009) mnt: Switching to new ns to clean ghosts
(00.454031) Error (criu/mount.c:3594): mnt: Can't remount root with MS_PRIVATE: Invalid argument
(00.454038) Error (criu/mount.c:3604): mnt: Can't unmount /tmp/.criu.mntns.Dxd7XG: Invalid argument
(00.454043) Error (criu/mount.c:3609): mnt: Can't remove the directory /tmp/.criu.mntns.Dxd7XG: Device or resource busy
(00.454053) Error (criu/cr-restore.c:2536): Restoring FAILED.

Additional information you deem important (e.g. issue happens only occasionally): Since the issue(https://github.com/checkpoint-restore/criu/issues/1899) was unresolved, I changed the code to skip the net namespace restoring.

diff --git a/criu/pstree.c b/criu/pstree.c
index f4d77b3..26995dc 100644
--- a/criu/pstree.c
+++ b/criu/pstree.c
@@ -949,7 +949,7 @@ static int prepare_pstree_kobj_ids(void)
                        return -1;
                }
        }
-
+       root_ns_mask &= ~CLONE_NEWNET;
        pr_debug("NS mask to use %lx\n", root_ns_mask);
        return 0;
 }
Output of `criu --version`:

Version: 3.17
GitID: 496bcdb
Output of `criu check --all`:

Warn  (criu/net.c:3435): Unable to get tun network namespace
Warn  (criu/sk-unix.c:224): unix: Unable to open a socket file: Bad address
Warn  (criu/net.c:3435): Unable to get socket network namespace
Warn  (criu/kerndat.c:1336): Can't get pidfd
Warn  (criu/kerndat.c:1453): CRIU was built without libnftables support
Warn  (criu/kerndat.c:1117): Can't keep kdat cache on non-tempfs
Error (criu/cr-check.c:748): Kernel doesn't support PTRACE_O_SUSPEND_SECCOMP
Error (criu/cr-check.c:793): Dumping seccomp filters not supported: Input/output error
Error (criu/cr-check.c:1022): cgroupns not supported. This is not fatal.
Warn  (criu/cr-check.c:1242): Do not have API to map vDSO - will use mremap() to restore vDSO
Warn  (criu/cr-check.c:1231): clone3() with set_tid not supported
Error (criu/cr-check.c:1273): Time namespaces are not supported
Error (criu/cr-check.c:1283): IFLA_NEW_IFINDEX isn't supported
Warn  (criu/cr-check.c:1300): Pidfd store requires pidfd_open syscall which is not supported
Warn  (criu/cr-check.c:1334): Nftables based locking requires libnftables and set concatenations support
Warn  (criu/cr-check.c:804): ptrace(PTRACE_GET_RSEQ_CONFIGURATION) isn't supported. C/R of processes which are using rseq() won't work.
Warn  (criu/cr-check.c:1160): compat_cr is not supported. Requires kernel >= v4.12
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

mrc1119 avatar May 24 '22 11:05 mrc1119

It's really strange to see errors:

(00.060855)      1: Error (criu/mount.c:2372): mnt: Unable to remove /tmp/cr-tmpfs.uWmVpe: Device or resource busy
(00.393971) Error (criu/mount.c:3609): mnt: Can't remove the directory /tmp/.criu.mntns.Dxd7XG: Device or resource busy

In both cases we try to remove directories which are service directories created by this criu run and having tmpfs mounted over them. We succeed to lazy umount but fail to rmdir, there should be nothing more there in this directory, no other mounts no files in it.

I don't see how it may happen, so can we please debug it a bit to understand what happens.

If you can reproduce the problem, please do:

  1. add sleep just after the failure, like this:
diff --git a/criu/mount.c b/criu/mount.c
index 115e3d067..ca7946589 100644
--- a/criu/mount.c
+++ b/criu/mount.c
@@ -3607,6 +3607,7 @@ static int __depopulate_roots_yard(void)
 
        if (rmdir(mnt_roots)) {
                pr_perror("Can't remove the directory %s", mnt_roots);
+               sleep(100);
                ret = -1;
        }
  1. and catch the process with gdb on sleep via https://github.com/Snorch/linux-helpers/blob/master/catch_sleeping_with_gdb.sh
bash catch_sleeping_with_gdb.sh criu path/to/criu/directory/criu/criu/criu
  1. now find pid of attached criu looking for "Attaching with gdb" in console
  2. in gdb go "up" several times to get to __depopulate_roots_yard stack and print mnt_roots
  3. now explore what is it there in mnt_roots:
ls /proc/$pid/root/$mnt_roots
cat /proc/$pid/mountinfo
  1. post results here

Snorch avatar May 25 '22 15:05 Snorch

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jun 26 '22 00:06 github-actions[bot]