pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

Replacing crashed worker ends with failure and missing coverage

Open Maks3w opened this issue 6 years ago • 10 comments

First at all, thanks for your time and the develop of this project.

I've an issue where sometimes our CI system kills a thread (not python issue) and the affected worker is restarted

May I'm interpreting in a bad way the auto worker restart feature.

What I expect is if the worker can be recovered then the test finalize without errors but in this case what I see a test error is raised and also coverage plugin is not able to recover coverage from the restarted worker

============================= test session starts ============================== platform linux -- Python 3.7.4, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 Django settings: settings.test (from command line option) rootdir: /root/app/src, inifile: setup.cfg, testpaths: apps plugins: responses-0.4.0, xdist-1.29.0, celery-4.3.0, profiling-1.7.0, cov-2.7.1, forked-1.0.2, django-3.5.1 gw0 [3154] / gw1 [3154] / gw2 [3154] / gw3 [3154] / gw4 [3154] / gw5 [3154] / gw6 [3154] / gw7 [3154] [gw7] node down: Not properly terminated f replacing crashed worker gw7 gw0 [3154] / gw1 [3154] / gw2 [3154] / gw3 [3154] / gw4 [3154] / gw5 [3154] / gw6 [3154] / gw8 ok. [ 0%] ............................... gw0 [3154] / gw1 [3154] / gw2 [3154] / gw3 [3154] / gw4 [3154] / gw5 [3154] / gw6 [3154] / gw8 [3154] ............. =================================== FAILURES =================================== ____________ test/test_models.py ____________ [gw7] linux -- Python 3.7.4 /usr/local/bin/python worker 'gw7' crashed while running 'tests/test_models.py::Test::test_xxx' ---- generated xml file: /root/app/src/test_result.xml -----

---------------------- coverage: failed slaves ----------------------- The following slaves failed to return coverage data, ensure that pytest-cov is installed on these laves. w7

Maks3w avatar Aug 28 '19 19:08 Maks3w

I am having the same issue as described above. Tried adding plugin pytest-rerunfailures to rerun the failed test due to crashed worker and that's not working either. All tests pass when run serially without using pytest-xdist plugin.

DavlinLotze avatar Nov 04 '19 16:11 DavlinLotze

I am having the similar issue. Any update on this.

Thanks.

cauvery avatar Dec 21 '19 08:12 cauvery

Same issue here.

I would add that under our Jenkins CI when a worker crashes it fails the job even though the test will eventually succeed on another worker.

This leads to a failed job that shows no tests in error in the junit reporting.

romaingz avatar Jan 17 '20 07:01 romaingz

Note to myself: The test which fails when a worker crashes is reported as failed (and gets written to the XML) but doesn't get written to the pytest cache lastfailed file. That's why rerunfailures doesn't rerun these tests.

joekohlsdorf avatar Feb 12 '21 15:02 joekohlsdorf

@joekohlsdorf Have you found a workaround for this?

ncclementi avatar Oct 06 '22 13:10 ncclementi

Yeah but you are not going to like it: I parse the log output.

grep "^FAILED .*" /dev/shm/pytest-output | cut -f 2 -d" " > /tmp/tests_to_rerun

joekohlsdorf avatar Oct 06 '22 13:10 joekohlsdorf

Have you tried to run with -n0 as a workaround?

dolby360 avatar Dec 06 '23 13:12 dolby360

Doesn't -n0 defeat the purpose of the plugin?

joekiller avatar Apr 09 '24 14:04 joekiller

The test which fails when a worker crashes is reported as failed (and gets written to the XML) but doesn't get written to the pytest cache lastfailed file. That's why rerunfailures doesn't rerun these tests.

I also ran into the issue of some tests crashing the worker at some point but succeeding when run individually. I expected rerunfailures to resolve this but found the same as you: Those tests are not detected as failed by the plugin but later reported as FAILED.

This sounds like an actual bug to me, but not sure in which project exactly hence where to report the "crashed test doesn't get written to lastfailed breaking rerunfailures" issue

@dolby360

Have you tried to run with -n0 as a workaround?

I tried that and it resulted in the same behavior: 1 worked spawned, crashed, reported as ERROR in the end but without integration with rerunfailures or similar

Flamefire avatar Apr 18 '24 11:04 Flamefire

Do we have any update on this? I am also facing a similar issue where pytest workers crash, causing test cases to fail with the status "FAILED". The new worker created to replace the crashed worker is not picking up the test cases, hence the test case re-try is not happening.

I am using the allure-pytest==2.8.40 plugin to capture the results and generate the reports, but unfortunately, those failures (caused by the pytest worker crash) are not collected, and the test case results are missing in the report. I believe this should go into the allure-pytest issue tracker, but I am curious to know if there is any workaround to help in this scenario.

The versions I am using are:

pytest-xdist==2.2.1 pytest-rerunfailures==9.1.1

chandra0714 avatar Jul 19 '24 10:07 chandra0714