pytest
pytest copied to clipboard
Test using multiprocessing only hangs when ran with test suite, works alone
TL;DR: If you're experiencing a hanging issue during tests with a program using multiprocessing, try adding the following code at the beginning of your problematic test function:
def a_test():
import multiprocessing as mp
mp.set_start_method("spawn", force=True)
# [... rest of test ...]
Description
I've encountered an odd test hanging issue with a program that uses multiprocessing. The program completes:
- When run individually from the command line
- As part of a test function invoked alone via pytest
The program hangs indefinitely when run as part of a full suite of tests.
I wanted to at least document what I experienced even if I can't create a simple minimal example, since I found a workaround that could help others, and no other issue mentioning something like this. All issues are for single tests.
Environment
python: 3.8.10
pytest: 7.2.0
pytest-mock: 3.10.0
OS: Ubuntu 20.04.6 LTS
Details
The problem occurs with a test that runs a program to completion, which internally uses multiprocessing (via pool.starmap on a generator). When the test is run individually, it completes without problems within roughly 5 seconds. However, when the test is included in a full test suite and run with pytest, it hangs indefinitely (for over 6 minutes at least).
The issue persists even when using concurrent.futures
in place of multiprocessing
(without a partial
as seen below).
My workaround is using mp.set_start_method("spawn", force=True)
at the start of the test.
Fix in immediate context:
func = partial(run_prediction, estimator=estimator, name=name, logdir=logdir)
items = enumerate(ea_handler.yield_split())
mp.set_start_method("spawn", force=True) # --> the fix, also works by being at the complete start of the test
with mp.Pool(nb_workers) as pool:
pool.starmap(func, items)
While the fix makes the test run slower (about 20 seconds), at least it doesn't hang indefinitely.
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Complete pip Environment
pip list output - click me
Package Version Location
----------------------------- ---------------- -------------------------------------------------------------------
absl-py 1.4.0
aiohttp 3.8.4
aiosignal 1.3.1
alabaster 0.7.13
alembic 1.11.1
anyio 3.7.0
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
astroid 2.15.5
asttokens 2.2.1
async-timeout 4.0.2
attrs 23.1.0
autopage 0.5.1
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.12.2
black 23.1.0
bleach 6.0.0
cachetools 5.3.1
certifi 2023.5.7
cffi 1.15.1
cfgv 3.3.1
charset-normalizer 3.1.0
click 8.1.3
cliff 4.3.0
cloudpickle 2.2.1
cmaes 0.9.1
cmd2 2.4.3
colorama 0.4.6
colorlog 6.7.0
comet-ml 3.21.0
comm 0.1.3
commonmark 0.9.1
configobj 5.0.8
cycler 0.11.0
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.6
distlib 0.3.6
doc2dash 2.4.1
docutils 0.20.1
dulwich 0.21.5
EpiLaP 0.1 $HOME/Projects/sources/epi_ml/src/python
everett 3.2.0
exceptiongroup 1.1.1
executing 1.2.0
fastjsonschema 2.17.1
filelock 3.12.0
fonttools 4.39.4
frozenlist 1.3.3
fsspec 2023.5.0
gitdb 4.0.10
GitPython 3.1.31
google-auth 2.19.0
google-auth-oauthlib 1.0.0
greenlet 2.0.2
grpcio 1.54.2
h5py 3.6.0
identify 2.5.24
idna 3.4
imagesize 1.4.1
imbalanced-learn 0.8.0
importlib-metadata 4.13.0
importlib-resources 5.12.0
iniconfig 2.0.0
ipykernel 6.23.1
ipython 8.12.2
ipython-genutils 0.2.0
ipywidgets 8.0.6
isort 5.12.0
jedi 0.18.2
Jinja2 3.1.2
joblib 1.2.0
jsonschema 4.17.3
jupyter 1.0.0
jupyter-client 8.2.0
jupyter-console 6.6.3
jupyter-core 5.3.0
jupyter-events 0.6.3
jupyter-server 2.6.0
jupyter-server-mathjax 0.2.6
jupyter-server-terminals 0.4.4
jupyterlab-pygments 0.2.2
jupyterlab-widgets 3.0.7
kaleido 0.2.1
kiwisolver 1.4.4
lazy-object-proxy 1.9.0
lightgbm 3.3.5
llvmlite 0.40.0
Mako 1.2.4
Markdown 3.4.3
MarkupSafe 2.1.2
matplotlib 3.5.3
matplotlib-inline 0.1.6
mccabe 0.7.0
mistune 2.0.5
multidict 6.0.4
mypy-extensions 1.0.0
nbclassic 0.5.3
nbclient 0.7.2
nbconvert 7.2.9
nbdime 3.1.0
nbformat 5.7.0
nest-asyncio 1.5.6
nodeenv 1.8.0
notebook 6.5.4
notebook-shim 0.2.3
numba 0.57.0
numpy 1.22.4
nvidia-ml-py3 7.352.0
oauthlib 3.2.2
optuna 3.0.6
overrides 7.3.1
packaging 23.1
pandas 1.4.4
pandocfilters 1.5.0
parso 0.8.3
pathspec 0.11.1
pbr 5.11.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.5.0
pip 20.0.2
pkg-resources 0.0.0
pkgutil-resolve-name 1.3.10
platformdirs 3.5.1
plotly 5.11.0
pluggy 1.0.0
pre-commit 3.0.4
prettytable 3.7.0
prometheus-client 0.17.0
prompt-toolkit 3.0.38
protobuf 3.20.1
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pyaml 23.5.9
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
pyDeprecate 0.3.2
Pygments 2.15.1
pylint 2.16.4
pyparsing 3.0.9
pyperclip 1.8.2
pyrsistent 0.19.3
pytest 7.2.0
pytest-mock 3.10.0
pytest-timeout 2.1.0
python-dateutil 2.8.2
python-json-logger 2.0.7
pytorch-lightning 1.6.5
pytz 2023.3
PyYAML 6.0
pyzmq 25.1.0
qtconsole 5.4.3
QtPy 2.3.1
requests 2.31.0
requests-oauthlib 1.3.1
requests-toolbelt 1.0.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 11.2.0
rsa 4.9
scikit-learn 1.1.2
scikit-optimize 0.9.0+unofficial
scipy 1.8.1
semantic-version 2.10.0
Send2Trash 1.8.2
setuptools 44.0.0
shap 0.41.0
six 1.16.0
slicer 0.0.7
smmap 5.0.0
sniffio 1.3.0
snowballstemmer 2.2.0
soupsieve 2.4.1
sphinx 7.0.1
sphinx-gallery 0.11.1
sphinxcontrib-applehelp 1.0.4
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
SQLAlchemy 2.0.15
stack-data 0.6.2
stevedore 5.1.0
tabulate 0.8.10
tenacity 8.2.2
tensorboard 2.13.0
tensorboard-data-server 0.7.0
terminado 0.17.1
threadpoolctl 3.1.0
tinycss2 1.2.1
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.8
torch 1.10.2
torchinfo 1.6.3
torchmetrics 0.7.2
tornado 6.3.2
tqdm 4.65.0
traitlets 5.9.0
typing-extensions 4.4.0
urllib3 2.0.2
virtualenv 20.23.0
wcwidth 0.2.6
webencodings 0.5.1
websocket-client 1.5.2
Werkzeug 2.3.4
wheel 0.40.0
widgetsnbextension 4.0.7
wrapt 1.15.0
wurlitzer 3.0.3
yarl 1.9.2
zipp 3.15.0
do the hangups also happen with pytest -s
yup. Traceback is not very helpful, since it's just the timeout one, but I'll still share for reference
Traceback: Stack of Threads
+++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-6 (139742417913600) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 576, in _handle_results
task = get()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-5 (139742283679488) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 528, in _handle_tasks
for taskseq, set_length in iter(taskqueue.get, None):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Stack of Thread-4 (139739615721216) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
File "/usr/lib/python3.8/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 519, in _handle_workers
cls._wait_for_updates(current_sentinels, change_notifier)
File "/usr/lib/python3.8/multiprocessing/pool.py", line 499, in _wait_for_updates
wait(sentinels, timeout=timeout)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/usr/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
+++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++
FAILURES
============================================== FAILURES ===============================================
_______________________________________ test_binary_classifier ________________________________________
test_dir = PosixPath('/tmp/pytest-of-rabyj/pytest-98/other_estimators1')
@pytest.mark.timeout(120)
def test_binary_classifier(test_dir: Path):
"""Test if binary models are properly supported."""
# Setting up environment variables
# import multiprocessing as mp # pylint: disable=import-outside-toplevel
# mp.set_start_method("spawn", force=True)
os.environ["MIN_CLASS_SIZE"] = "1"
os.environ["LABEL_LIST"] = '["female", "male"]'
epi_ml.core.estimators.NFOLD_PREDICT = 2
# Loading paths
datasource = EpiAtlasTreatmentTestData.default_test_data(
label_category="sex"
).epiatlas_dataset.datasource
current_dir = Path(__file__).parent.resolve()
hparams_file = current_dir.parent / "fixtures" / "other_estimators_hparams.json"
sys.argv = [
"other_estimators.py",
"--predict",
"sex",
f"{datasource.hdf5_file}",
f"{datasource.chromsize_file}",
f"{datasource.metadata_file}",
str(test_dir),
"--hyperparams",
f"{hparams_file}",
]
> main_module()
mains/other_estimators_test.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../epi_ml/other_estimators.py:226: in main
estimators.run_predictions(ea_handler, estimator, model_name, cli.logdir)
../epi_ml/core/estimators.py:400: in run_predictions
pool.starmap(func, items)
/usr/lib/python3.8/multiprocessing/pool.py:372: in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
/usr/lib/python3.8/multiprocessing/pool.py:765: in get
self.wait(timeout)
/usr/lib/python3.8/multiprocessing/pool.py:762: in wait
self._event.wait(timeout)
/usr/lib/python3.8/threading.py:558: in wait
signaled = self._cond.wait(timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Condition(<unlocked _thread.lock object at 0x7f1850acbdb0>, 0)>, timeout = None
def wait(self, timeout=None):
"""Wait until notified or until a timeout occurs.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notify_all() call for the same condition
variable in another thread, or until the optional timeout occurs. Once
awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
When the underlying lock is an RLock, it is not released using its
release() method, since this may not actually unlock the lock when it
was acquired multiple times recursively. Instead, an internal interface
of the RLock class is used, which really unlocks it even when it has
been recursively acquired several times. Another internal interface is
then used to restore the recursion level when the lock is reacquired.
"""
if not self._is_owned():
raise RuntimeError("cannot wait on un-acquired lock")
waiter = _allocate_lock()
waiter.acquire()
self._waiters.append(waiter)
saved_state = self._release_save()
gotit = False
try: # restore state no matter what (e.g., KeyboardInterrupt)
if timeout is None:
> waiter.acquire()
E Failed: Timeout >120.0s
/usr/lib/python3.8/threading.py:302: Failed
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Not sure either, but perhaps using a different timeout method might yield different results?
perhaps using a different timeout method might yield different results?
The timeout was to avoid the specific test hanging forever, so the test suite can finish. I actually installed a timeout module (pytest-timeout: 2.1.0
) BECAUSE it was hanging forever and causing my test suite to not be able to finish.
With this in mind, I honestly don't understand how the timeout method would be related to the bug :/
With this in mind, I honestly don't understand how the timeout method would be related to the bug :/
Out of insights then, sorry.
Same is happening for me with multiprocessing.pool.ThreadPool
, but the fix is not working in that case. I assume it's because "spawn" is not being called in this context. Any suggestions? Thanks!
@lorinkoz
Recently, I thought the problem had came back for me, but by using pytest -s
I noticed something else entirely was happening. So do try to debug using that.
Otherwise, make sure the following code happens before any multiprocessing
import, so make sure there's no way the multiprocessing
module is imported before your individual test (e.g. imported through another import)
import multiprocessing as mp
mp.set_start_method("spawn", force=True)
Those are the only two things that come to mind, hope it helps.
Thanks you this helped me
TL;DR: If you're experiencing a hanging issue during tests with a program using multiprocessing, try adding the following code at the beginning of your problematic test function:
def a_test(): import multiprocessing as mp mp.set_start_method("spawn", force=True) # [... rest of test ...]
Description
I've encountered an odd test hanging issue with a program that uses multiprocessing. The program completes:
- When run individually from the command line
- As part of a test function invoked alone via pytest
The program hangs indefinitely when run as part of a full suite of tests.
I wanted to at least document what I experienced even if I can't create a simple minimal example, since I found a workaround that could help others, and no other issue mentioning something like this. All issues are for single tests.
Environment
python:
3.8.10
pytest:7.2.0
pytest-mock:3.10.0
OS:Ubuntu 20.04.6 LTS
Details
The problem occurs with a test that runs a program to completion, which internally uses multiprocessing (via pool.starmap on a generator). When the test is run individually, it completes without problems within roughly 5 seconds. However, when the test is included in a full test suite and run with pytest, it hangs indefinitely (for over 6 minutes at least).
The issue persists even when using
concurrent.futures
in place ofmultiprocessing
(without apartial
as seen below).My workaround is using
mp.set_start_method("spawn", force=True)
at the start of the test.Fix in immediate context:
func = partial(run_prediction, estimator=estimator, name=name, logdir=logdir) items = enumerate(ea_handler.yield_split()) mp.set_start_method("spawn", force=True) # --> the fix, also works by being at the complete start of the test with mp.Pool(nb_workers) as pool: pool.starmap(func, items)
While the fix makes the test run slower (about 20 seconds), at least it doesn't hang indefinitely.
I'm still uncertain as to why this issue occurs and why the fix works, so I would greatly appreciate any insights!
Complete pip Environment
pip list output - click me
@rabyj thanks for sharing this. I had a similar problem using the pytest-console-scripts
library to test some scripts. In my case, it was happening on the GitlabCI, the test was stuck until the job reached the timeout. Using the multiprocessing spawn
allowed the test to work without issues.
EDIT: I found that if I make sure to explicitly close all files that are opened in a sub-process, I no longer need to set mp.set_start_method("spawn", force=True
.
I ran into this problem as well. However, mp.set_start_method("spawn", force=True)
causes pickle errors in my code ("object is not the same..."), which do not happen with the defaults. Are there alternative ways to get around the hanging sub-process issue that avoid using spawn
?
Closing this for now then.
For the record: pytest just runs each test function in sequence, if the test function is blocking somehow, probably the code being tested needs some adjustment -- my point is that pytest does not have a "known issue" with multiprocessing at all, it all depends on the user code running inside the tests.
If this issue comes up again, it is probably best to work on a MWE so a proper solution can be discussed.