userfaultfd-rs
userfaultfd-rs copied to clipboard
Falling back to syscall on EACCES
Hi,
I wonder why UffdBuilder::open_file_descriptor()
falls back to the syscall only if opening the device file failed with ENOENT
. On both of the main systems I use for work, /dev/userfaultfd is not accessible by users other than root (mode rw-------
, owner root:root
), and I can’t remember having it explicitly configured this way, so I assume this is the default. Opening /dev/userfaultfd thus returns EPERM
, which leads to open_file_descriptor()
just failing instead of trying the syscall as well. One of my systems has 1 in /proc/sys/vm/unprivileged_userfaultfd, so the syscall would work if we were to use it—but we never do, because the device file is there, just not accessible.
Is there a reason why open_file_descriptor()
only falls back to the syscall on ENOENT
, and not when encountering other errors?
Background: We’re planning to add postcopy migration support to virtiofsd, which is a VM device emulation for filesystem passthrough between host and guest. To do that, we want to use the support that the rust-vmm vhost crates provide, which rely on userfaultfd-rs to do so.
virtiofsd can sandbox itself, which, as a side effect, will hide /dev/userfaultfd. Consequently, on the system where unprivileged userfaultfd is allowed (but the device file is not accessible by non-root users), a sandboxed virtiofsd can successfully create a userfaultfd with this crate (because opening the device file returns ENOENT
, so the syscall is used), but a non-sandboxed virtiofsd cannot (because it returns EACCESS
, failing immediately).
I’d be happy to send a PR, but the code is very explicit about only falling back on ENOENT
(including the comment above), which is why I’m hesitant. Clearly there’s a reason, but I can’t see it from the code or the commit log.