Replacing crashed worker ends with failure and missing coverage
First at all, thanks for your time and the develop of this project.
I've an issue where sometimes our CI system kills a thread (not python issue) and the affected worker is restarted
May I'm interpreting in a bad way the auto worker restart feature.
What I expect is if the worker can be recovered then the test finalize without errors but in this case what I see a test error is raised and also coverage plugin is not able to recover coverage from the restarted worker
============================= test session starts ============================== platform linux -- Python 3.7.4, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 Django settings: settings.test (from command line option) rootdir: /root/app/src, inifile: setup.cfg, testpaths: apps plugins: responses-0.4.0, xdist-1.29.0, celery-4.3.0, profiling-1.7.0, cov-2.7.1, forked-1.0.2, django-3.5.1 gw0 [3154] / gw1 [3154] / gw2 [3154] / gw3 [3154] / gw4 [3154] / gw5 [3154] / gw6 [3154] / gw7 [3154] [gw7] node down: Not properly terminated f replacing crashed worker gw7 gw0 [3154] / gw1 [3154] / gw2 [3154] / gw3 [3154] / gw4 [3154] / gw5 [3154] / gw6 [3154] / gw8 ok. [ 0%] ............................... gw0 [3154] / gw1 [3154] / gw2 [3154] / gw3 [3154] / gw4 [3154] / gw5 [3154] / gw6 [3154] / gw8 [3154] ............. =================================== FAILURES =================================== ____________ test/test_models.py ____________ [gw7] linux -- Python 3.7.4 /usr/local/bin/python worker 'gw7' crashed while running 'tests/test_models.py::Test::test_xxx' ---- generated xml file: /root/app/src/test_result.xml -----
---------------------- coverage: failed slaves ----------------------- The following slaves failed to return coverage data, ensure that pytest-cov is installed on these laves. w7
I am having the same issue as described above. Tried adding plugin pytest-rerunfailures to rerun the failed test due to crashed worker and that's not working either. All tests pass when run serially without using pytest-xdist plugin.
I am having the similar issue. Any update on this.
Thanks.
Same issue here.
I would add that under our Jenkins CI when a worker crashes it fails the job even though the test will eventually succeed on another worker.
This leads to a failed job that shows no tests in error in the junit reporting.
Note to myself: The test which fails when a worker crashes is reported as failed (and gets written to the XML) but doesn't get written to the pytest cache lastfailed file. That's why rerunfailures doesn't rerun these tests.
@joekohlsdorf Have you found a workaround for this?
Yeah but you are not going to like it: I parse the log output.
grep "^FAILED .*" /dev/shm/pytest-output | cut -f 2 -d" " > /tmp/tests_to_rerun
Have you tried to run with -n0 as a workaround?
Doesn't -n0 defeat the purpose of the plugin?
The test which fails when a worker crashes is reported as failed (and gets written to the XML) but doesn't get written to the pytest cache lastfailed file. That's why rerunfailures doesn't rerun these tests.
I also ran into the issue of some tests crashing the worker at some point but succeeding when run individually. I expected rerunfailures to resolve this but found the same as you: Those tests are not detected as failed by the plugin but later reported as FAILED.
This sounds like an actual bug to me, but not sure in which project exactly hence where to report the "crashed test doesn't get written to lastfailed breaking rerunfailures" issue
@dolby360
Have you tried to run with -n0 as a workaround?
I tried that and it resulted in the same behavior: 1 worked spawned, crashed, reported as ERROR in the end but without integration with rerunfailures or similar
Do we have any update on this? I am also facing a similar issue where pytest workers crash, causing test cases to fail with the status "FAILED". The new worker created to replace the crashed worker is not picking up the test cases, hence the test case re-try is not happening.
I am using the allure-pytest==2.8.40 plugin to capture the results and generate the reports, but unfortunately, those failures (caused by the pytest worker crash) are not collected, and the test case results are missing in the report. I believe this should go into the allure-pytest issue tracker, but I am curious to know if there is any workaround to help in this scenario.
The versions I am using are:
pytest-xdist==2.2.1 pytest-rerunfailures==9.1.1