multiprocessing-logging
multiprocessing-logging copied to clipboard
Deadlock / hangup under Python 3.9
I have recently switched to Python 3.9, and code that previously worked with multiprocessing_logging
suddenly started hanging.
Given the code
with Pool(num_processes=8) as pool:
for res in pool.imap_unordered(fn, inputs):
do_something(res)
pool.close()
pool.join()
, I get the following two errors. Sometimes it hangs in join()
:
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/multiprocessing/pool.py", line 666, in join
p.join()
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/multiprocessing/popen_fork.py", line 43, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
, sometimes in imap
:
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/site-packages/tqdm/std.py", line 1185, in __iter__
for obj in iterable:
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/multiprocessing/pool.py", line 858, in next
self._cond.wait(timeout)
File "/home/user/miniconda3/envs/wikifier/lib/python3.9/threading.py", line 312, in wait
waiter.acquire()
There are also times when it finishes without a problem, but not every time. As I remember, I never had this issue in Python 3.8, but it is entirely possible that it is the underlying operating system version that is to blame.
On the other hand, logging from pool processes now works without install_mp_handler
, which didn't before, so maybe now multiprocessing
itself supports this functionality?
OS: Linux 5.4.0-77-generic #86-Ubuntu SMP x86_64 GNU/Linux (in an LXC container) Python: 3.9.5 (via conda)
If it works without installing the handler, then each process is writing independently to the log and records will be corrupted if two or more write at the same time.
This problem might be related to locks being in the wrong state when forking. No idea why there would be a difference between 3.8 and 3.9.
I've not tested this module on 3.9 yet.
Same problem for me. Switched to 21.04LTS and get this: May 22 07:04:05 tychocam authbind[7536]: --- Logging error --- May 22 07:04:05 tychocam authbind[7536]: Traceback (most recent call last): May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/logging/handlers.py", line 73, in emit May 22 07:04:05 tychocam authbind[7536]: if self.shouldRollover(record): May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/logging/handlers.py", line 197, in shouldRollover May 22 07:04:05 tychocam authbind[7536]: self.stream.seek(0, 2) #due to non-posix-compliant Windows feature May 22 07:04:05 tychocam authbind[7536]: ValueError: I/O operation on closed file. May 22 07:04:05 tychocam authbind[7536]: Call stack: May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/threading.py", line 966, in _bootstrap May 22 07:04:05 tychocam authbind[7536]: self._bootstrap_inner() May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/threading.py", line 1009, in _bootstrap_inner May 22 07:04:05 tychocam authbind[7536]: self.run() May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/threading.py", line 946, in run May 22 07:04:05 tychocam authbind[7536]: self._target(*self._args, **self._kwargs) May 22 07:04:05 tychocam authbind[7536]: File "/var/www/FlaskApp/main_program/mainApp/cloud_processing/app_cloud.py", line 127, in start_cloud_app May 22 07:04:05 tychocam authbind[7536]: with get_context("fork").Pool(3) as pl: May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/context.py", line 119, in Pool May 22 07:04:05 tychocam authbind[7536]: return Pool(processes, initializer, initargs, maxtasksperchild, May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/pool.py", line 212, in init May 22 07:04:05 tychocam authbind[7536]: self._repopulate_pool() May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/pool.py", line 303, in _repopulate_pool May 22 07:04:05 tychocam authbind[7536]: return self._repopulate_pool_static(self._ctx, self.Process, May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/pool.py", line 326, in _repopulate_pool_static May 22 07:04:05 tychocam authbind[7536]: w.start() May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start May 22 07:04:05 tychocam authbind[7536]: self._popen = self._Popen(self) May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/context.py", line 277, in _Popen May 22 07:04:05 tychocam authbind[7536]: return Popen(process_obj) May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init May 22 07:04:05 tychocam authbind[7536]: self._launch(process_obj) May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 71, in _launch May 22 07:04:05 tychocam authbind[7536]: code = process_obj._bootstrap(parent_sentinel=child_r) May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap May 22 07:04:05 tychocam authbind[7536]: self.run() May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run May 22 07:04:05 tychocam authbind[7536]: self._target(*self._args, **self._kwargs) May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker May 22 07:04:05 tychocam authbind[7536]: result = (True, func(*args, **kwds)) May 22 07:04:05 tychocam authbind[7536]: File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar May 22 07:04:05 tychocam authbind[7536]: return list(map(*args)) May 22 07:04:05 tychocam authbind[7536]: File "/var/www/FlaskApp/main_program/mainApp/cloud_processing/app_cloud.py", line 396, in process_image May 22 07:04:05 tychocam authbind[7536]: logger.debug('saving plot to cloud_stack: ' + filename) May 22 07:04:05 tychocam authbind[7536]: Message: 'saving plot to cloud_stack: 1653221036'
@CAReeser this might be something else. In the trace you post, it says: May 22 07:04:05 tychocam authbind[7536]: ValueError: I/O operation on closed file.
This might be the same as #49.
I'll consider it a duplicate.