bubblewrap
bubblewrap copied to clipboard
Bind mounting over symlinks attempts to create their targets and bind mount over them
What I observe:
$ touch a_file
$ ln -s a_file rel_symlink
$ bwrap --bind / / --bind a_file $PWD/rel_symlink /usr/bin/env ls -l $PWD/rel_symlink
lrwxrwxrwx 1 robryk users 6 Sep 14 23:55 /home/robryk/zero-k/rel_symlink -> a_file
$ ln -s $PWD/a_file abs_symlink
$ bwrap --bind / / --bind a_file $PWD/abs_symlink /usr/bin/env ls -l $PWD/abs_symlink
bwrap: Can't create file at /home/robryk/zero-k/abs_symlink: No such file or directory
$ ls -l nonexistent
ls: cannot access 'nonexistent': No such file or directory
$ ln -s nonexistent broken_symlink
$ bwrap --bind / / --bind a_file $PWD/broken_symlink /usr/bin/env ls -l $PWD/broken_symlink
lrwxrwxrwx 1 robryk users 11 Sep 14 23:55 /home/robryk/zero-k/broken_symlink -> nonexistent
$ ls -l nonexistent
-rw-rw-rw- 1 robryk users 0 Sep 15 00:04 nonexistent
$ ln -s ../../../nonexistent another_broken_symlink
$ bwrap --bind / / --bind a_file $PWD/another_broken_symlink /usr/bin/env ls -l $PWD/another_broken_symlink
bwrap: Can't create file at /home/robryk/zero-k/another_broken_symlink: Permission denied
It seems that:
- Mounting over an unbroken symlink caused
bwrapto mount over the symlink's target. - Mounting over an unbroken, absolute symlink caused
bwrapto fail. - Mounting over a broken symlink caused
bwrapto create the symlink target and mount over it. - Mounting over a broken symlink that points into a directory that is not writeable caused bwrap to fail.
What I expect:
Mounting over any kind of symlink to cause the resulting mount namespace to have the symlink replaced with the source of the mount. This did not happen in any of the four cases.
If doing that is impossible (quick skim of man mount(2) didn't yield an equivalent of O_NOFOLLOW, even though umount2 does have the UMOUNT_NOFOLLOW flag), I expect bwrap to exit with an error. I also expect it not to mount anything over the symlink's target. This failed to happen in cases 1 and 3.
It would be nice if the "File not found" error was more descriptive. It's not obvious which file is "missing" (my first intuition was that the problem was with one of the directories in the target path).
Bind mounting over symlinks attempts to create their targets and bind mount over them
Sorry, this is how mount(2) works. We cannot mount anything on a symlink: mount(2) follows symlinks, whether we want it to or not.
It's still confusing to actually do that then, instead of failing. I would wager that this is ~never the desired behaviour. I would prefer if (see the last paragraphs of the original report):
- bwrap errored out in that case ~~(I understand that it's impossible to do that reliably, but doing it in a fashion that admits races is better than nothing)~~(since Linux 5.10 you can do this reliably),
- bwrap didn't create targets of broken symlinks,
- any "File not found" errors were accompanied by the path to the file that was unexpectedly not present.
Bind mounting over symlinks attempts to create their targets and bind mount over them
Sorry, this is how
mount(2)works. We cannot mount anything on a symlink:mount(2)follows symlinks, whether we want it to or not.
My mount (2) manpage says:
MS_NOSYMFOLLOW (since Linux 5.10)
Do not follow symbolic links when resolving paths. Symbolic links can still be created, and readlink(1), readlink(2), realpath(1), and realpath(3) all still work properly.
Is this the feature that could make it work?
Is this the feature that could make it work?
It depends exactly how it works. Perhaps you could try it and see what happens?
Hm. I can't see how the behavior changes, adding this option or leaving it out. I guess it won't help here.
Note that MS_NOSYMFOLLOW seems to be an option that affects the created mount (i.e. symlinks that are in that filesystem won't be followed) and not how mount() resolves its target path.
Can the new mounting API be leveraged for this?
https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html
It looks like the new syscall open_tree accepts the AT_SYMLINK_NOFOLLOW:
https://github.com/torvalds/linux/blob/968f35f4ab1c0966ceb39af3c89f2e24afedf878/tools/testing/selftests/mount_setattr/mount_setattr_test.c#L1138
The move_mount also has the symlink following flag:
https://github.com/torvalds/linux/blob/968f35f4ab1c0966ceb39af3c89f2e24afedf878/include/uapi/linux/mount.h#L73
Can the new mounting API be leveraged for this?
I don't know, can it? Or would the syscall just fail with ELOOP if the target is a symlink? (Those are the two reasonable interpretations of AT_SYMLINK_NOFOLLOW.)
If the answer is that a new mounting API can mount over the top of an existing symlink without dereferencing the symlink, I'd be happy to review a pull request. It would still need to be able to fall back to the old mounting API to support older kernels.
I'd be happy to review a pull request. It would still need to be able to fall back to the old mounting API to support older kernels.
I will try to experiment with the new API. One issue is that it is barely documented. I guess I will read to the kernel source code to figure it out.
Actually the mount from util-linux can already use new mount API and has an option to switch between them.
https://github.com/util-linux/util-linux/commit/fd6b4d94ff013dc7ed680d4e864610da5b9751f1
So lets put it to test:
1). Create a non-existent symlink:
>>> ln -s /nonexistent ./test
2). Unshare the mount namespace:
>>> unshare --user --map-root-user --mount
3). Check that it is still symlink.
>>> /usr/bin/ls -l ./
total 0
lrwxrwxrwx 1 igo95862 igo95862 12 Dec 5 23:10 test -> /nonexistent
4). Do a mount over a symlink forcing new api. Mount some existing file.
env LIBMOUNT_FORCE_MOUNT2=never mount --bind /etc/resolv.conf ./test
5). Check what happens.
>>> /usr/bin/ls -l ./
total 4
-rw-r--r-- 1 nobody nobody 73 Dec 3 17:31 test
>>> cat ./test
nameserver 127.0.0.1
nameserver ::1
So it does become a regular file that we mounted on top.
Lets try to use the old mout API:
>>> env LIBMOUNT_FORCE_MOUNT2=always mount --bind /etc/resolv.conf ./test
mount: ./test: mount point is a symbolic link to nowhere.
dmesg(1) may have more information after failed mount system call.
So the new API definitely fixes this issue.
Some strace logs.
Using new API:
open_tree(AT_FDCWD, "/etc/resolv.conf", OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC) = 3
move_mount(3, "", AT_FDCWD, "./test", MOVE_MOUNT_F_EMPTY_PATH) = 0
Using old API:
mount("/etc/resolv.conf", "./test", 0x55e0e3605670, MS_BIND, NULL) = -1 ENOENT (No such file or directory)
One issue is that it is barely documented.
You can grab the manpage drafts from the kernel mailing list to get a better starting point.
You can grab the manpage drafts from the kernel mailing list to get a better starting point.
Could you please give a link. I am not good with mailing lists.
https://lore.kernel.org/all/[email protected]/
https://duckduckgo.com/?q=site%3Akernel.org+%22Add+manpage+for%22+move_mount&ia=web
https://lore.kernel.org/all/[email protected]/
https://duckduckgo.com/?q=site%3Akernel.org+%22Add+manpage+for%22+move_mount&ia=web
Thank you.
Weird that those man pages patches were around since 2018 but never merged.
I've rendered the man pages to human readable format:
Looking at man pages it seems the new API has a few more advantages. For example, it has option of not triggering auto-mount.
If you look at this part of man page looks like new mount API can mount over mount point that does not exist.
FWIW: https://github.com/sunfishcode/linux-mount-api-documentation (@igo95862)
Actually you can do this with the old mount api as well using /proc/self/fd.
diff --git a/bind-mount.c b/bind-mount.c
index 57b4236..d034c51 100644
--- a/bind-mount.c
+++ b/bind-mount.c
@@ -395,7 +395,9 @@ bind_mount (int proc_fd,
if (src)
{
- if (mount (src, dest, NULL, MS_SILENT | MS_BIND | (recursive ? MS_REC : 0), NULL) != 0)
+ dest_fd = openat(AT_FDCWD, dest, O_CLOEXEC | O_PATH | O_NOFOLLOW);
+ dest_proc = get_oldroot_path (xasprintf ("/proc/self/fd/%d", dest_fd));
+ if (mount (src, dest_proc, NULL, MS_SILENT | MS_BIND | (recursive ? MS_REC : 0), NULL) != 0)
return BIND_MOUNT_ERROR_MOUNT;
}
$ ln -s ENOENT hello1.txt
$ echo hello > hello2.txt
$ stat -c '%F' hello1.txt
symbolic link
$ cat hello1.txt
cat: hello1.txt: No such file or directory
$ ./_builddir/bwrap --dev-bind / / --bind $PWD/hello2.txt $PWD/hello1.txt stat -c '%F' hello1.txt
regular file
$ ./_builddir/bwrap --dev-bind / / --bind $PWD/hello2.txt $PWD/hello1.txt cat hello1.txt
hello
[!CAUTION] The PoC diff from above does not
- handle errors (return code of openat, ...).
- cares about cleanup (free, close).
- cares about preserving other flags done below via mountinfo.
- stop bwrap from creating the target file.