asyncio icon indicating copy to clipboard operation
asyncio copied to clipboard

os.fork: documenting quirks and preventing it in debug mode

Open 1st1 opened this issue 8 years ago • 9 comments

Long story short, one should never try to fork in a running event loop. Things that will break:

  1. kqueue/epoll selectors will (likely) error with EBADF as soon as the forked child process does anything with them.
  2. Some resources will leak.
  3. Some resources will be closed incrorrectly, or, worst case, the queued data maybe be sent more than once.

Sadly, it's not possible to reliably fix the above. Hence this is a documentation issue, and maybe we can do something in debug mode.

Documentation

The only safe way of doing fork is to do it with no active event loop in the process. If it's absolutely needed, the semi-safe method of forking a running event loop is this:

  1. Have a global variable fork_requested set to False
  2. Before forking, fork_requested should be set to True, and loop.stop() method should be called.
  3. The loop will stop, and the code will continue to execute after the loop.run_forever or loop.run_until_complete statement (however the loop was started).
  4. At that place, you should check if fork_requested is True, and if it is, perform the fork. It's safe to call loop.close() in the child process, and to re-start the loop in the parent one.

Debug Mode

Monkey-patch os.fork to error out in debug mode when it's called. This isn't the prettiest solution, but it will work reliably. Moreover, in uvloop, for instance, it's either this, or a program crash (abort() called in C by libuv).

1st1 avatar May 16 '16 21:05 1st1

Hi,

We do use fork() in a running loop: a master process regularly spawns workers and communicates with them using the loop. The solution that we came up with is not very elegant but works as expected for our use case (we only target Linux).

Right after the fork() in the new worker process, we raise a ForkException (a custom subclass of BaseException). This exception will bubble up to loop.run_forever(), hence unwinds the stack up to the "main" while ensuring that the loop will not run. We then monkey patch the loop's selector (still in the worker) so loop.close() will free the resources (opened files, pending tasks, executor's threads, etc), but will not unregister the file descriptors from the epoll structure, hence will not break the loop of the parent process. We also deactivate logging on tasks which are not finished but are collected (they will never run in the worker, as we closed the loop and will create a fresh one).

We plan to try uvloop soon. I haven't looked at it yet, but if forking cleanly seems too hard, we'll use the standard implementation in the master and use uvloop in the workers only (in our app, the master is almost always idle, so the performance gain should be rather small).

Martiusweb avatar May 21 '16 17:05 Martiusweb

Right after the fork() in the new worker process, we raise a ForkException (a custom subclass of BaseException). This exception will bubble up to loop.run_forever(), hence unwinds the stack up to the "main" while ensuring that the loop will not run. We then monkey patch the loop's selector (still in the worker) so loop.close() will free the resources (opened files, pending tasks, executor's threads, etc), but will not unregister the file descriptors from the epoll structure, hence will not break the loop of the parent process. We also deactivate logging on tasks which are not finished but are collected (they will never run in the worker, as we closed the loop and will create a fresh one).

This seems to be a very fragile approach. I'd highly recommend you to try the approach I described in my first message -- loop.stop(); os.fork(); loop.run()

1st1 avatar May 21 '16 20:05 1st1

I agree, it is fragile, but when I wrote this, this was an easy and efficient way to solve our use case.

I am afraid that calling stop() then run() again may introduce latency in handling incoming events, since there are no guarantees regarding the overhead of stopping/restarting the loop, nor when run_forever() returns after the stop has been requested.

Anyway, this is not the biggest issue with forking a process in which a loop runs (or ran):

  • resource leaks are expected, I proposed to add a detach() method to transports (mimicking the semantics of socket.socket.detach()). It allows to call transport.close() and cleanup resources without touching the underlying file descriptor. As an example, I patched BaseSubprocessTransport in the python bug 23540 (http://bugs.python.org/issue23540) - but I didn't actively requested feedback.
  • Regarding your 4th point, loop.close() can not be called safely in the child, at least on Linux using epoll: it will remove the self pipe from the fds watched by the selector, and it affects the parent. That's why we need to monkey patch the selector so selector.unregister() becomes a no-op.

Martiusweb avatar May 23 '16 12:05 Martiusweb

http://bugs.python.org/issue16500 - Add an 'atfork' module

socketpair avatar May 29 '16 07:05 socketpair

ProcessPoolExecutor uses os.fork (through multiprocessing.popen_fork.Popen). Does that mean it is unsafe to use loop.run_in_executor with a ProcessPoolExecutor? Or is it OK as long as the target doesn't mess with the event loop?

vxgmichel avatar Jul 27 '16 16:07 vxgmichel

@vxgmichel As long as you do exec right after forking you're safe. I think multiprocessing does that, right?

1st1 avatar Jul 27 '16 16:07 1st1

As far as I understand, Popen._launch runs os.fork then Process._bootstrap in the child. _bootstrap does a bunch of things before running the target, but I don't see any exec.

vxgmichel avatar Jul 27 '16 17:07 vxgmichel

I added a process pool executor test to uvloop - https://github.com/MagicStack/uvloop/commit/ad5181b36a51a0ac2ab4aaec829359711afdeda9. It doesn't crash (and it would if event loop started to execute any code after forking).

Perhaps the reason is that multiprocessing calls fork, and then, after fork, it executes the target code. When target is done, it exits the process.

This way, after you fork, the event loop in the child process is "paused" in the callback/coroutine that called the fork. If you do something after that and exit - the event loop won't execute any of its code before the process is killed, which means it won't crash.

1st1 avatar Jul 27 '16 18:07 1st1

Perhaps the reason is that multiprocessing calls fork, and then, after fork, it executes the target code. When target is done, it exits the process.

Yes that's exactly what multiprocessing does.

I get it now, os.fork() is safe as long as the child exits the process when it's done running the target so it doesn't give the control back to the event loop. However, this means monkey-patching os.fork is not a suitable solution (unless the patched method inspect the stack frame to let the fork run if the call comes from multiprocessing).

vxgmichel avatar Jul 27 '16 18:07 vxgmichel