polling icon indicating copy to clipboard operation
polling copied to clipboard

make `Poller` fork-safe

Open acully-vmware opened this issue 3 years ago • 9 comments
trafficstars

We have an application in which we tried to use a polling::Poller in a parent process to accept a connection, then fork and handle the connection in the child process. (We aren't using exec, right now.) I discovered that this doesn't work well: after the fork, when we tear down the Poller structure, its Drop implementation removes the timer_fd and event_fd from the epoll_fd. But because the file-descriptors (such as the epoll file-descriptor that Poller uses internally) are shared between the parent process and the child process, this means that the epoll_fd (which remains open in the parent process) gets into an invalid state. I'd like to close the files in the child, they aren't needed there, and I don't want to leak, but I'd like to avoid modifying the epoll_fd while doing so.

With this change:

--- src/epoll.rs
+++ src/epoll.rs
@@ -235,10 +235,8 @@ impl Drop for Poller {
         );
         if let Some(timer_fd) = self.timer_fd {
-            let _ = self.delete(timer_fd);
             let _ = syscall!(close(timer_fd));
         }
-        let _ = self.delete(self.event_fd);
         let _ = syscall!(close(self.event_fd));
         let _ = syscall!(close(self.epoll_fd));
     }

I believe the implementation would be fork-safe, and slightly more performant.

acully-vmware avatar Feb 02 '22 15:02 acully-vmware

If this is agreeable, I can make an MR with this change, and add a test that Poller becomes fork-safe (on platforms that use fork).

acully-vmware avatar Feb 02 '22 15:02 acully-vmware

This would need to be something that the user can choose, as the default needs to stay as it is right now.

dignifiedquire avatar Feb 10 '22 19:02 dignifiedquire

This would need to be something that the user can choose, as the default needs to stay as it is right now.

I don't believe the suggestion I'm making will change user-visible behavior, except when using fork, and the current behavior under fork breaks the parent process. Please correct me if I'm wrong, but:

  • The file-descriptors in question are private to the Poller instance;
  • The Poller structure does not implement Clone (that I can tell) -- there is no possibility of multiple Pollers aliasing the same kernel-side structures, except via fork (or, I suppose, clone without CLONE_FILES);
  • The Poller itself is no longer accessible to Rust code as the Drop function is being invoked.

The change I suggest is to stop performing unneeded clean-up on file-descriptors that are about to be destroyed: Since they are about to be destroyed, the clean-up is unnecessary in the common case. In case fork is called, though, clean-up in the child process affects the parent-process's epoll file-descriptor in a way that breaks behavior.

I know that fork is a problematic API for epoll. In principle, we should be able to handle the problems by closing the epoll file-descriptor (and the timer_fd and event_fd that are created along with it in the Poller structure) in the child, preventing interaction with the parent. As the code stands today, though, I'm forced to either modify our local version of polling, or forget the Poller instance (leaking file-descriptors in the process), or not use polling at all, and create our own implementation of the function.

acully-vmware avatar Feb 11 '22 04:02 acully-vmware

this sounds like the epoll fd should be created using EPOLL_CLOEXEC...

fogti avatar Dec 30 '22 07:12 fogti

this sounds like the epoll fd should be created using EPOLL_CLOEXEC...

epoll is already created with EPOLL_CLOEXEC.

notgull avatar Dec 30 '22 23:12 notgull

oh. ah right, I think extracting the FD and forgetting the Poller instance would be the most sensible option. (which could be "packages" into a method on Poller to handle any edge cases and such, #[cfg(unix)] probably (I think *BSD should be handled similarly))

fogti avatar Jan 01 '23 16:01 fogti

Interestingly, Boost ASIO has a notify_fork function, which allows the reactor to recreate itself if a fork has happened. I'm not sure if we'd like to have this function (nor am I implying that we should be taking any positive lessons from Boost), but I figure it's worth noting.

notgull avatar Jan 09 '23 19:01 notgull

that sounds a bit appealing (such a function would be marked unsafe, unless we can make sure that it couldn't disrupt anything if it would be called without a previous fork, and could deal with the interactions from other threads)

fogti avatar Jan 10 '23 09:01 fogti

Now that I think about it, a function like this would be far more relevant to async-io, because it already keeps track of sources there and forks hurt the reactor.

notgull avatar Aug 15 '23 05:08 notgull