redpanda
redpanda copied to clipboard
[22.1.x] CI Failure in rptest.tests.shadow_indexing_tx_test::ShadowIndexingTxTest.test_shadow_indexing_aborted_txs
Version & Environment
Redpanda version: v22.1.x
Failure in: rptest.tests.shadow_indexing_tx_test::ShadowIndexingTxTest.test_shadow_indexing_aborted_txs
What went wrong?
CI Failure
What should have happened instead?
Ci Success
How to reproduce the issue?
???
Additional information
https://ci-artifacts.dev.vectorized.cloud/redpanda/01823946-8bd4-47df-bc3e-5a582371ce80/vbuild/ducktape/results/2022-07-26--001/report.html
[INFO - 2022-07-26 08:18:45,354 - runner_client - log - lineno:278]: RunnerClient: rptest.tests.shadow_indexing_tx_test.ShadowIndexingTxTest.test_shadow_indexing_aborted_txs: FAIL: TimeoutError('producing failed')
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
data = self.run_test()
File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
return self.test_context.function(self.test)
File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
r = f(self, *args, **kwargs)
File "/root/tests/rptest/tests/shadow_indexing_tx_test.py", line 128, in test_shadow_indexing_aborted_txs
wait_until(done,
File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 58, in wait_until
raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: producing failed
Most likely it's not related to shadow indexing. The test works in two stages. First, it produces the data using transactions and aborts some of them. Next, it waits until some segments are evicted from the local storage and consumes data. The first stage fails due to timeout error. The test handles kafka api errors (it just reconnects and retries) but not timeout errors.
Updated title+description because while this was originally noticed on 22.1.x, it is actually present on dev as well (seen on 2022-10-6)
Another instance https://ci-artifacts.dev.vectorized.cloud/redpanda/0183f0ef-2bae-4085-a4dc-3eabde4c2da3/vbuild/ducktape/results/2022-10-19--001/EndToEndTopicRecovery/test_restore_with_aborted_tx/recovery_overrides=/35/
https://ci-artifacts.dev.vectorized.cloud/redpanda/0183f0ef-2bae-4085-a4dc-3eabde4c2da3/vbuild/ducktape/results/2022-10-19--001/report.html
This hasn't failed on dev
runs in last 30 days, but Bharath's report above is <30d old (presumably from a PR).
This still needs someone to dissect the logs and see what happened: this doesn't necessarily seem like a redpanda bug on the face of it, but we need to check.
This hasn't failed in the last 30 days, and we do not have the analysis for a root cause in the transactions/idempotency code, closing this.