pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

Tests show as both FAILED and PASSED after node crash

Open hb2638 opened this issue 1 year ago • 3 comments

Hi, I'm noticing that when we have a worker crash (which started happening frequenctly last week) the test appears as both FAILED and PASSED.

E.x.: Below is a snippet of the logs for tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21. It ran on worker #7 which crashed and then ran later on worker #8 which PASSED.

	Line 4026: tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 
	Line 4037: [gw7] node down: Not properly terminated
	Line 4038: [gw7] [ 97%] FAILED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21

replacing crashed worker gw7

	Line 4047: tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 
	Line 4070: [gw8] [ 98%] PASSED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 
	Line 7994: worker 'gw7' crashed while running 'tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21'
	Line 8220: =========================== short test summary info ============================
	Line 8227: FAILED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
	Line 9397: PASSED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
	Line 9420: = 6 failed, 1137 passed, 252 skipped, 96 warnings, 15 rerun in 6617.65s (1:50:17) =

package versions: pytest-7.4.0 pytest_cov-4.1.0 pytest_xdist-3.3.1 coverage-7.2.7 pytest_rerunfailures-12.0 psutil-5.9.5

command line: pytest --log-format="%Y-%m-%dT%H:%M:%S.%f%z" --log-date-format="%Y-%m-%d %H:%M:%S" --log-format "%(asctime)s %(levelname)-8s [%(name)s|%(process)d|%(thread)d|%(threadName)s] [%(pathname)s:%(funcName)s:%(lineno)d] %(message)s" --max-worker-restart 5 -n 16 --dist loadgroup -rfEsxXp --reruns 2 --reruns-delay 30 -v --tb=long -o faulthandler_timeout=3600 --durations=20 --durations-min=60 --cov=src/ tests

We're running about 1000 tests using 16 workers

hb2638 avatar Jul 17 '23 20:07 hb2638

Perhaps that's due to --reruns 2 in the command-line?

nicoddemus avatar Jul 17 '23 20:07 nicoddemus

reruns

I don't know because I know I see entries that start with

	Line 1278: plugins: rerunfailures-12.0, cov-
	Line 1300: [gw6] [  0%] RERUN tests/aws/test
	Line 1302: [gw2] [  0%] RERUN tests/aws/test
	Line 1304: [gw4] [  0%] RERUN tests/aws/test
	Line 1654: [gw2] [ 12%] RERUN tests/src/test
	Line 1664: [gw2] [ 12%] RERUN tests/src/test
	Line 1676: [gw2] [ 13%] RERUN tests/src/test
	Line 1694: [gw2] [ 13%] RERUN tests/src/test
	Line 1744: [gw13] [ 15%] RERUN tests/src/tes
	Line 1754: [gw13] [ 15%] RERUN tests/src/tes
	Line 3294: [gw13] [ 70%] RERUN tests/src/scr
	Line 3322: [gw13] [ 71%] RERUN tests/src/scr
	Line 3364: [gw13] [ 73%] RERUN tests/src/scr
	Line 3558: [gw13] [ 80%] RERUN tests/src/scr
	Line 3632: [gw10] [ 82%] RERUN tests/cimdb/l
	Line 4075: [gw5] [ 98%] RERUN tests/src/scri

but I'm not seeing that for tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 .

Most of the tests involve hitting the DB and the tests sometimes intermittently fail because of a sql deadlock, so we want to retry the test a few times before failing.

hb2638 avatar Jul 17 '23 21:07 hb2638

RERUN can only work when the test fails with an error or exception, but does not work for a hard crash (indicated by the message replacing crashed worker gw7).

nicoddemus avatar Jul 17 '23 21:07 nicoddemus