pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

worker 'gwX' crashed - internal error KeyError: <WorkerController gwX>

Open whme opened this issue 4 years ago • 0 comments

When running our test suite with 10 worker using pytest -n 10 --dist loadfile tests/, test execution sporadically fails with the following error message

internal error

failed on setup with "worker 'gwX' crashed while running 'tests/some_test.py::test_something'"
worker 'gwX' crashed while running 'tests/some_test.py::test_something'

and the following traceback

Traceback (most recent call last):
  File "/data/venv/lib/python3.7/site-packages/_pytest/main.py", line 269, in wrap_session
    session.exitstatus = doit(config, session) or 0
  File "/data/venv/lib/python3.7/site-packages/_pytest/main.py", line 323, in _main
    config.hook.pytest_runtestloop(session=session)
  File "/data/venv/lib/python3.7/site-packages/pluggy/_hooks.py", line 265, in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
  File "/data/venv/lib/python3.7/site-packages/pluggy/_manager.py", line 80, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/data/venv/lib/python3.7/site-packages/pluggy/_callers.py", line 55, in _multicall
    gen.send(outcome)
  File "/data/venv/lib/python3.7/site-packages/pluggy/_result.py", line 60, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/data/venv/lib/python3.7/site-packages/pluggy/_callers.py", line 39, in _multicall
    res = hook_impl.function(*args)
  File "/data/venv/lib/python3.7/site-packages/xdist/dsession.py", line 112, in pytest_runtestloop
    self.loop_once()
  File "/data/venv/lib/python3.7/site-packages/xdist/dsession.py", line 135, in loop_once
    call(**kwargs)
  File "/data/venv/lib/python3.7/site-packages/xdist/dsession.py", line 256, in worker_collectionfinish
    self.sched.schedule()
  File "/data/venv/lib/python3.7/site-packages/xdist/scheduler/loadscope.py", line 341, in schedule
    self._reschedule(node)
  File "/data/venv/lib/python3.7/site-packages/xdist/scheduler/loadscope.py", line 323, in _reschedule
    self._assign_work_unit(node)
  File "/data/venv/lib/python3.7/site-packages/xdist/scheduler/loadscope.py", line 261, in _assign_work_unit
    worker_collection = self.registered_collections[node]
KeyError: <WorkerController gwX>

with the X in gwX being the number of the worker that failed.

pytest related packages we use:

$ pip freeze | grep pytest
pytest==6.2.5
pytest-asyncio==0.15.1
pytest-cov==2.12.1
pytest-flask==1.2.0
pytest-forked==1.3.0
pytest-lazy-fixture==0.6.3
pytest-mock==3.6.1
pytest-xdist==2.3.0

python version:

$ python --version
Python 3.7.1

From my understanding it can happen that a test failure causes a worker to crash, in which case

pytest-xdist will [by default] automatically restart that worker and report the test's failure source

Could there be a bug in the restart logic which causes the KeyError to be thrown? Or is this expected in such a case?

Additionally, all the output we get from the failed test which caused the worker to crash is what I've shared above, no traceback of the actual test execution, how can I best debug this?

whme avatar Oct 11 '21 14:10 whme