pytest-xdist
pytest-xdist copied to clipboard
Tests show as both FAILED and PASSED after node crash
Hi, I'm noticing that when we have a worker crash (which started happening frequenctly last week) the test appears as both FAILED and PASSED.
E.x.: Below is a snippet of the logs for tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21. It ran on worker #7 which crashed and then ran later on worker #8 which PASSED.
Line 4026: tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
Line 4037: [gw7] node down: Not properly terminated
Line 4038: [gw7] [ 97%] FAILED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
replacing crashed worker gw7
Line 4047: tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
Line 4070: [gw8] [ 98%] PASSED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
Line 7994: worker 'gw7' crashed while running 'tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21'
Line 8220: =========================== short test summary info ============================
Line 8227: FAILED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
Line 9397: PASSED tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21
Line 9420: = 6 failed, 1137 passed, 252 skipped, 96 warnings, 15 rerun in 6617.65s (1:50:17) =
package versions: pytest-7.4.0 pytest_cov-4.1.0 pytest_xdist-3.3.1 coverage-7.2.7 pytest_rerunfailures-12.0 psutil-5.9.5
command line: pytest --log-format="%Y-%m-%dT%H:%M:%S.%f%z" --log-date-format="%Y-%m-%d %H:%M:%S" --log-format "%(asctime)s %(levelname)-8s [%(name)s|%(process)d|%(thread)d|%(threadName)s] [%(pathname)s:%(funcName)s:%(lineno)d] %(message)s" --max-worker-restart 5 -n 16 --dist loadgroup -rfEsxXp --reruns 2 --reruns-delay 30 -v --tb=long -o faulthandler_timeout=3600 --durations=20 --durations-min=60 --cov=src/ tests
We're running about 1000 tests using 16 workers
Perhaps that's due to --reruns 2
in the command-line?
reruns
I don't know because I know I see entries that start with
Line 1278: plugins: rerunfailures-12.0, cov-
Line 1300: [gw6] [ 0%] RERUN tests/aws/test
Line 1302: [gw2] [ 0%] RERUN tests/aws/test
Line 1304: [gw4] [ 0%] RERUN tests/aws/test
Line 1654: [gw2] [ 12%] RERUN tests/src/test
Line 1664: [gw2] [ 12%] RERUN tests/src/test
Line 1676: [gw2] [ 13%] RERUN tests/src/test
Line 1694: [gw2] [ 13%] RERUN tests/src/test
Line 1744: [gw13] [ 15%] RERUN tests/src/tes
Line 1754: [gw13] [ 15%] RERUN tests/src/tes
Line 3294: [gw13] [ 70%] RERUN tests/src/scr
Line 3322: [gw13] [ 71%] RERUN tests/src/scr
Line 3364: [gw13] [ 73%] RERUN tests/src/scr
Line 3558: [gw13] [ 80%] RERUN tests/src/scr
Line 3632: [gw10] [ 82%] RERUN tests/cimdb/l
Line 4075: [gw5] [ 98%] RERUN tests/src/scri
but I'm not seeing that for tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py::StoredProcTestCase::test_updates_before_start_date_ignored@/opt/gitlab-runner/builds/abc/pipelines/tests/cimdb/staging/sp/xyz/test_portfolio_valuation_lot_merge.py:21 .
Most of the tests involve hitting the DB and the tests sometimes intermittently fail because of a sql deadlock, so we want to retry the test a few times before failing.
RERUN can only work when the test fails with an error or exception, but does not work for a hard crash (indicated by the message replacing crashed worker gw7
).