tornado
tornado copied to clipboard
not possible to get process ID's in `tornado.process.fork_processes()`
It is currently not possible to get the process ID's of the childrens in tornado.process.fork_processes()
(e.g. to use them in a signal handler).
fork_processes()
returns a child ID, which is just a integer starting at 0.
Maybe - this is a little bit ugly but backwards compatible - a third argument can be added which replaces the children
dict which is now initialized in that function?
Or setting fork_processes.children()
but then the function can be only used once.
Or a global variable.
Do you have other nicer ideas?
diff --git tornado/process.py tornado/process.py
index 26428feb..9b58a946 100644
--- tornado/process.py
+++ tornado/process.py
@@ -80,7 +80,8 @@ _task_id = None
def fork_processes(
- num_processes: Optional[int], max_restarts: Optional[int] = None
+ num_processes: Optional[int], max_restarts: Optional[int] = None,
+ children: Optional[dict] = None,
) -> int:
"""Starts multiple worker processes.
@@ -121,7 +122,8 @@ def fork_processes(
if num_processes is None or num_processes <= 0:
num_processes = cpu_count()
gen_log.info("Starting %d processes", num_processes)
- children = {}
+ if children is None:
+ children = {}
def start_child(i: int) -> Optional[int]:
pid = os.fork()
When fork_processes()
returns a child ID, you're inside that child process. fork_processes()
never actually returns in the parent process (except by raising an exception). The parent process blocks and handles terminated child processes on its own. So even with that patch, you'd populate a dict that no one could ever see.
It sounds like you're trying to manage child processes on your own, which suggests to me that you'd be better off calling os.fork()
and all the rest by yourself; fork_processes()
is not the right tool for you to build on.
I am aware of this, but like i initially said: even if the function never returns, there might be signal handlers, which interrupt the main thread. I would like to be able to send the received signal to all child processes.
Ah, that makes sense.
In that case, I think making the children dict a global would be a simple solution. It's fine to make it a global because fork_processes
is also something that you can only do once in a process.
Or if forwarding signals is the only real use case, maybe fork_processes
should take an argument for a list of signals to catch and forward to the child processes. (it could perhaps also use signalfd and pidfd where available)
Speaking of pidfd, that might give us a solution to solve this problem from the other direction (depending on what your use case for forwarding signals). The reason you often want to forward signals is that the child processes have no good way to know when their parent process exits, but pidfd
gives them a solution. You could let the parent process terminate normally on signal, and have the child processes use pidfd
to shut down when the parent is gone.
Sounds good. For my use case I probably need to do a little more things than sending the signal to the child processes, e.g. logging, handling others things in the main process. In your aproach it nevertheless would be easier to detect if I am the child or the parent by registering the signal handler for the childs before forking and for the parent after the forking.
pidfd_open()
was newly added in Linux-5.3 and Python-3.9, so it's a bit new to depend on, IMHO
I created a workaround by sharing a children
instance between all the processes and set the pid there after the fork()
:
import os
import sys
import signal
import time
import multiprocessing
from tornado.process import fork_processes
children = multiprocessing.Manager().dict()
child = None
def sig_usr1(signo, frame):
print('Received signal', signal)
print('Child=%r, children=%r, pid=%r' % (child, dict(children), os.getpid()))
def sig_usr2(signo, frame):
sys.exit(0)
signal.signal(signal.SIGUSR1, sig_usr1)
signal.signal(signal.SIGUSR2, sig_usr2)
def main():
try:
child = fork_processes(5, 0)
except SystemExit:
return
if child is not None:
children[child] = os.getpid()
while True:
time.sleep(2)
if __name__ == '__main__':
try:
main()
finally:
children.pop(child, None) # FIXME: the atexit of the multiprocessing.Manager handler fails to sync
Ah, that makes sense.
In that case, I think making the children dict a global would be a simple solution. It's fine to make it a global because
fork_processes
is also something that you can only do once in a process.
It's possible to double-fork and I am doing this in one use case (i.e. call fork_processes(2, 0)
and inside the childs call fork_processes(0, 0)
).
Or if forwarding signals is the only real use case, maybe
fork_processes
should take an argument for a list of signals to catch and forward to the child processes. (it could perhaps also use signalfd and pidfd where available)
Additionally to forwarding the signals to the childs I need to handle the signal as well in the main process. E.g. if the main process kills the childs, log something, reloads configurations or stop/unlink a common process for shared memory. Therefore a list of signals doesn't help me if tornado defines the signal handler then.
Speaking of pidfd, that might give us a solution to solve this problem from the other direction (depending on what your use case for forwarding signals). The reason you often want to forward signals is that the child processes have no good way to know when their parent process exits, but
pidfd
gives them a solution. You could let the parent process terminate normally on signal, and have the child processes usepidfd
to shut down when the parent is gone.
Sounds great.
It's possible to double-fork and I am doing this in one use case (i.e. call fork_processes(2, 0) and inside the childs call fork_processes(0, 0)).
Yes, but in that case each child process would have its own children dict for its children (the top-level process wouldn't know about its grandchildren). What I meant was that (unless you call fork_processes a second time from a signal handler), you can't have more than one set of child processes at once so it's fine for the list of children to be a process global variable.