asyncio
asyncio copied to clipboard
os.fork: documenting quirks and preventing it in debug mode
Long story short, one should never try to fork in a running event loop. Things that will break:
- kqueue/epoll selectors will (likely) error with
EBADF
as soon as the forked child process does anything with them. - Some resources will leak.
- Some resources will be closed incrorrectly, or, worst case, the queued data maybe be sent more than once.
Sadly, it's not possible to reliably fix the above. Hence this is a documentation issue, and maybe we can do something in debug mode.
Documentation
The only safe way of doing fork is to do it with no active event loop in the process. If it's absolutely needed, the semi-safe method of forking a running event loop is this:
- Have a global variable
fork_requested
set toFalse
- Before forking,
fork_requested
should be set toTrue
, andloop.stop()
method should be called. - The loop will stop, and the code will continue to execute after the
loop.run_forever
orloop.run_until_complete
statement (however the loop was started). - At that place, you should check if
fork_requested
isTrue
, and if it is, perform the fork. It's safe to callloop.close()
in the child process, and to re-start the loop in the parent one.
Debug Mode
Monkey-patch os.fork
to error out in debug mode when it's called. This isn't the prettiest solution, but it will work reliably. Moreover, in uvloop, for instance, it's either this, or a program crash (abort()
called in C by libuv
).
Hi,
We do use fork()
in a running loop: a master process regularly spawns workers and communicates with them using the loop. The solution that we came up with is not very elegant but works as expected for our use case (we only target Linux).
Right after the fork()
in the new worker process, we raise a ForkException
(a custom subclass of BaseException
). This exception will bubble up to loop.run_forever()
, hence unwinds the stack up to the "main" while ensuring that the loop will not run. We then monkey patch the loop's selector (still in the worker) so loop.close()
will free the resources (opened files, pending tasks, executor's threads, etc), but will not unregister the file descriptors from the epoll structure, hence will not break the loop of the parent process. We also deactivate logging on tasks which are not finished but are collected (they will never run in the worker, as we closed the loop and will create a fresh one).
We plan to try uvloop soon. I haven't looked at it yet, but if forking cleanly seems too hard, we'll use the standard implementation in the master and use uvloop in the workers only (in our app, the master is almost always idle, so the performance gain should be rather small).
Right after the fork() in the new worker process, we raise a ForkException (a custom subclass of BaseException). This exception will bubble up to loop.run_forever(), hence unwinds the stack up to the "main" while ensuring that the loop will not run. We then monkey patch the loop's selector (still in the worker) so loop.close() will free the resources (opened files, pending tasks, executor's threads, etc), but will not unregister the file descriptors from the epoll structure, hence will not break the loop of the parent process. We also deactivate logging on tasks which are not finished but are collected (they will never run in the worker, as we closed the loop and will create a fresh one).
This seems to be a very fragile approach. I'd highly recommend you to try the approach I described in my first message -- loop.stop(); os.fork(); loop.run()
I agree, it is fragile, but when I wrote this, this was an easy and efficient way to solve our use case.
I am afraid that calling stop()
then run()
again may introduce latency in handling incoming events, since there are no guarantees regarding the overhead of stopping/restarting the loop, nor when run_forever()
returns after the stop has been requested.
Anyway, this is not the biggest issue with forking a process in which a loop runs (or ran):
- resource leaks are expected, I proposed to add a
detach()
method to transports (mimicking the semantics ofsocket.socket.detach()
). It allows to calltransport.close()
and cleanup resources without touching the underlying file descriptor. As an example, I patchedBaseSubprocessTransport
in the python bug 23540 (http://bugs.python.org/issue23540) - but I didn't actively requested feedback. - Regarding your 4th point,
loop.close()
can not be called safely in the child, at least on Linux using epoll: it will remove the self pipe from the fds watched by the selector, and it affects the parent. That's why we need to monkey patch the selector soselector.unregister()
becomes a no-op.
http://bugs.python.org/issue16500 - Add an 'atfork' module
ProcessPoolExecutor
uses os.fork
(through multiprocessing.popen_fork.Popen
). Does that mean it is unsafe to use loop.run_in_executor
with a ProcessPoolExecutor
? Or is it OK as long as the target doesn't mess with the event loop?
@vxgmichel As long as you do exec
right after forking you're safe. I think multiprocessing does that, right?
As far as I understand, Popen._launch runs os.fork
then Process._bootstrap in the child. _bootstrap
does a bunch of things before running the target, but I don't see any exec
.
I added a process pool executor test to uvloop - https://github.com/MagicStack/uvloop/commit/ad5181b36a51a0ac2ab4aaec829359711afdeda9. It doesn't crash (and it would if event loop started to execute any code after forking).
Perhaps the reason is that multiprocessing calls fork, and then, after fork, it executes the target code. When target is done, it exits the process.
This way, after you fork, the event loop in the child process is "paused" in the callback/coroutine that called the fork. If you do something after that and exit - the event loop won't execute any of its code before the process is killed, which means it won't crash.
Perhaps the reason is that multiprocessing calls fork, and then, after fork, it executes the target code. When target is done, it exits the process.
Yes that's exactly what multiprocessing
does.
I get it now, os.fork()
is safe as long as the child exits the process when it's done running the target so it doesn't give the control back to the event loop. However, this means monkey-patching os.fork
is not a suitable solution (unless the patched method inspect the stack frame to let the fork run if the call comes from multiprocessing
).