kuyruk
kuyruk copied to clipboard
worker: add a command line argument to run task in a separate process
I will add a few tests.
Which signals should child process handle other than default? I guess the only one is USR1 and it is inherited from parent.
I set other signal(SIGINT, SIGTERM, SIGHUP, SIGUSR2) handlers inherited from parent process, to defaults. According to signals manual, all of these signals have a default to termination.
I think it is okay now, but can't wait to have your feedback.
Have you verified that signal handlers are inherited in the child process? I couldn't find that information in Python documentation so I've written a small program to test that: https://gist.github.com/cenkalti/17a3d518b8456cff308ff644fcd0b8b5
If my test is correct, signal handlers are not inherited in the child process.
These tasks still remain:
- (new) handle child failure (status code other than zero)
- Integration test to verify
- successful task case
- exception case
For the record: signal handlers are not inherited on child processes created with spawn
method. spawn
method is default for macOS(starting from python 3.8) and Windows. On Unix, default method is fork
and it is considered unsafe. Also fork
method is not available on Windows.
I believe we should make reliable changes that works on each of these operation systems. In that case, what we can do is spawning a new process, pass some shared resources to it and let it create its own resources. That I tried today but couldn't figure out some errors. For example:
2021-12-29 13:11:48,869 [ERROR] kuyruk.worker:267 - Task raised an exception:
Traceback (most recent call last):
File "/pyenv/lib/python3.6/site-packages/kuyruk/worker.py", line 242, in _process_task
result = self._run_task(message.channel.connection, task, args, kwargs)
File "/pyenv/lib/python3.6/site-packages/kuyruk/worker.py", line 298, in _run_task
self._task_process.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function myfunction at 0x7fb6390f9c80>: it's not the same object as x.y.myfunction
That is because multiprocessing
wants to pickle something which is not pickable. Also multiprocessing
pickles them because it needs to send them to child process and reconstruct them there.
Will investigate tomorrow.
Some ideas,
For the sake of simplicity, we can only target Python 3.8 on Linux.
For the exception serialization problem; before raising the exception in child, we can catch it and try to pickle it before re-raising it. If we get a pickle error, we can wrap the exception and store the original exception as a string.
To prevent any misunderstanding, the exception above happens when we launch child process. So it tries to pickle a thing (probably the "task" given in the parameters) and moves it to the child process, then restore it. That way we can have task object in child process.
Also I agree with simplicity approach, we can target specific operating systems for this feature only.
Sorry for the misunderstanding. Now I understand the unpicklable object is the Task instance. We can move the importing of the task into the child process but that would break the backwards compatibility since that task object is also used in signals.
Another option would be instead of passing the task object, we can pass the module and the function as a string and reimport the task object inside the child process.
I think the better approach is making the Task class picklable by overriding __getstate__
and __setstate__
methods.
I tried to make Task.f
picklable by dill
but no luck yet. Main problem is pickling a function which has only a reference. I will try again later.