mydocker 关于源码中使用unix.PivotRoot(".", ".")来切换rootfs的不解

在runc项目中的实现是使用了unix.PivotRoot(".", ".")来切换rootfs的，而这个系统调用我在man page上看到的说明是两个目录是不能相同的，请问一下这句代码要怎么理解呢？

Sep 25 '18 09:09 HUANGChaoLi

书本上是使用了syscall.Mount(root, root,”bind”, syscall.MS_BIND|syscall.MS_REC, ””)来进行mount操作的，可是这个root文件夹都是一样的，我在自己机器上mount同一个目录会出现错误，请问书本这句代码又要怎么理解呢？

Sep 25 '18 09:09 HUANGChaoLi

在系统调用的manpage中找不到，但是上面注释有讲:

	// While the documentation may claim otherwise, pivot_root(".", ".") is
	// actually valid. What this results in is / being the new root but
	// /proc/self/cwd being the old root. Since we can play around with the cwd
	// with pivot_root this allows us to pivot without creating directories in
	// the rootfs. Shout-outs to the LXC developers for giving us this idea.

实际上文档应该是要求put_old是new_root的子文件夹的，而且要是不同的文件系统，/proc/self/cwd的文件系统类型是proc，而"."的文件系统又是不一样的，比较奇怪的是，/proc/self/cwd为什么是"."的子文件夹

       The following restrictions apply to new_root and put_old:

       -  They must be directories.

       -  new_root and put_old must not be on the same filesystem as the
          current root.

       -  put_old must be underneath new_root, that is, adding a nonzero
          number of /.. to the string pointed to by put_old must yield the
          same directory as new_root.

       -  No other filesystem may be mounted on put_old.

附上作者修改的commit链接

Sep 26 '18 02:09 HUANGChaoLi

povit_root的新目录不能和原来的root目录在一个文件系统上,这一句相当与命令行的mount -o bind root root,虽然是同一个文件,但是挂载以后就不在同一个文件系统下了.如果不加上这一句是不能执行povit_root的

Oct 08 '19 14:10 dadahua555

如果用命令行的话,不执行unshare -m也是不能执行povit_root的,那这里的代码,哪里有相同的功能呢?有大佬解答吗?

Oct 08 '19 14:10 dadahua555

如果用命令行的话,不执行unshare -m也是不能执行povit_root的,那这里的代码,哪里有相同的功能呢?有大佬解答吗?

请问老哥问题解决了吗我查了下好像都说是需要unshare -m才可以

Dec 30 '19 02:12 ImJerryChan

povit_root的新目录不能和原来的root目录在一个文件系统上,这一句相当与命令行的mount -o bind root root,虽然是同一个文件,但是挂载以后就不在同一个文件系统下了.如果不加上这一句是不能执行povit_root的

原理是什么呢？为啥自己挂载自己可以让一个目录下面的所有文件切换到另一个文件系统中呢？

Dec 14 '21 14:12 sunaaaaaaa

饶有兴趣地翻到了这个历史的 issue。在 2024 年初看这个问题，能找到很多资料，给看到的同学一点帮助

pivot_root 确实有 balabala 的限制 new_root and put_old must not be on the same mount as the current root.
但是，(或许是后面添加的) 在 man-page 还有一小节，解释了 pivot_root(".", ".") 的妙用：

pivot_root(".", ".")
   new_root and put_old may be the same directory.  In particular,
   the following sequence allows a pivot-root operation without
   needing to create and remove a temporary directory:

       chdir(new_root);
       pivot_root(".", ".");
       umount2(".", MNT_DETACH);

   This sequence succeeds because the pivot_root() call stacks the
   old root mount point on top of the new root mount point at /.  At
   that point, the calling process's root directory and current
   working directory refer to the new root mount point (new_root).
   During the subsequent umount() call, resolution of "."  starts
   with new_root and then moves up the list of mounts stacked at /,
   with the result that old root mount point is unmounted.

浅显的翻译一下就是，pivot_root(".", ".") 确实按想法将 new_root 挂载在了 "/"，并且 put_old 也挂载在了 rootfs 变更后的 "/" 上；因为两者存在一个执行先后，组织成了一个栈(stack)，也就是说，put_old 堆叠在了 new_root 上面。
当后续执行 umount 时，卸载的是栈顶的 put_old，这样就完成了一个 "no-pivot" (也就是无支点) rootfs 切换。

Notes

pivot_root 本身名字就说明了一切，之前 runc 的做法是创建一个有读写权限的 pivot_dir 当做支点切换 rootfs。因为这个妙用，runc 实现了 work with a completely read-only rootfs。有理由相信，当时是一个 trick，因为 Aleksa Sarai 说是 LXC developers 给的 idea，具体的 commit 见引用 2
执行 pivot_root(".", ".") 时 workDir 即 /proc/self/cwd，也就是 put_old

Reference

pivot_root(2) — Linux manual page
Related PR and discussion in runc rootfs: make pivot_root not use a temporary directory

Feb 04 '24 07:02 An-DJ

mydocker mydocker copied to clipboard

关于源码中使用unix.PivotRoot(".", ".")来切换rootfs的不解

Notes

Reference

mydocker
mydocker copied to clipboard