userfaultfd-rs icon indicating copy to clipboard operation
userfaultfd-rs copied to clipboard

Falling back to syscall on EACCES

Open XanClic opened this issue 7 months ago • 3 comments

Hi,

I wonder why UffdBuilder::open_file_descriptor() falls back to the syscall only if opening the device file failed with ENOENT. On both of the main systems I use for work, /dev/userfaultfd is not accessible by users other than root (mode rw-------, owner root:root), and I can’t remember having it explicitly configured this way, so I assume this is the default. Opening /dev/userfaultfd thus returns EPERM, which leads to open_file_descriptor() just failing instead of trying the syscall as well. One of my systems has 1 in /proc/sys/vm/unprivileged_userfaultfd, so the syscall would work if we were to use it—but we never do, because the device file is there, just not accessible.

Is there a reason why open_file_descriptor() only falls back to the syscall on ENOENT, and not when encountering other errors?

Background: We’re planning to add postcopy migration support to virtiofsd, which is a VM device emulation for filesystem passthrough between host and guest. To do that, we want to use the support that the rust-vmm vhost crates provide, which rely on userfaultfd-rs to do so.

virtiofsd can sandbox itself, which, as a side effect, will hide /dev/userfaultfd. Consequently, on the system where unprivileged userfaultfd is allowed (but the device file is not accessible by non-root users), a sandboxed virtiofsd can successfully create a userfaultfd with this crate (because opening the device file returns ENOENT, so the syscall is used), but a non-sandboxed virtiofsd cannot (because it returns EACCESS, failing immediately).

I’d be happy to send a PR, but the code is very explicit about only falling back on ENOENT (including the comment above), which is why I’m hesitant. Clearly there’s a reason, but I can’t see it from the code or the commit log.

XanClic avatar Jul 23 '24 14:07 XanClic