redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

[22.1.x] CI Failure in rptest.tests.shadow_indexing_tx_test::ShadowIndexingTxTest.test_shadow_indexing_aborted_txs

Open BenPope opened this issue 2 years ago • 1 comments

Version & Environment

Redpanda version: v22.1.x

Failure in: rptest.tests.shadow_indexing_tx_test::ShadowIndexingTxTest.test_shadow_indexing_aborted_txs

What went wrong?

CI Failure

What should have happened instead?

Ci Success

How to reproduce the issue?

???

Additional information

https://ci-artifacts.dev.vectorized.cloud/redpanda/01823946-8bd4-47df-bc3e-5a582371ce80/vbuild/ducktape/results/2022-07-26--001/report.html

[INFO  - 2022-07-26 08:18:45,354 - runner_client - log - lineno:278]: RunnerClient: rptest.tests.shadow_indexing_tx_test.ShadowIndexingTxTest.test_shadow_indexing_aborted_txs: FAIL: TimeoutError('producing failed')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/shadow_indexing_tx_test.py", line 128, in test_shadow_indexing_aborted_txs
    wait_until(done,
  File "/usr/local/lib/python3.10/dist-packages/ducktape/utils/util.py", line 58, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg) from last_exception
ducktape.errors.TimeoutError: producing failed

BenPope avatar Jul 26 '22 13:07 BenPope

Most likely it's not related to shadow indexing. The test works in two stages. First, it produces the data using transactions and aborts some of them. Next, it waits until some segments are evicted from the local storage and consumes data. The first stage fails due to timeout error. The test handles kafka api errors (it just reconnects and retries) but not timeout errors.

Lazin avatar Aug 02 '22 14:08 Lazin

Updated title+description because while this was originally noticed on 22.1.x, it is actually present on dev as well (seen on 2022-10-6)

jcsp avatar Oct 18 '22 13:10 jcsp

Another instance https://ci-artifacts.dev.vectorized.cloud/redpanda/0183f0ef-2bae-4085-a4dc-3eabde4c2da3/vbuild/ducktape/results/2022-10-19--001/EndToEndTopicRecovery/test_restore_with_aborted_tx/recovery_overrides=/35/

https://ci-artifacts.dev.vectorized.cloud/redpanda/0183f0ef-2bae-4085-a4dc-3eabde4c2da3/vbuild/ducktape/results/2022-10-19--001/report.html

bharathv avatar Oct 19 '22 18:10 bharathv

This hasn't failed on dev runs in last 30 days, but Bharath's report above is <30d old (presumably from a PR).

This still needs someone to dissect the logs and see what happened: this doesn't necessarily seem like a redpanda bug on the face of it, but we need to check.

jcsp avatar Nov 10 '22 11:11 jcsp

This hasn't failed in the last 30 days, and we do not have the analysis for a root cause in the transactions/idempotency code, closing this.

jcsp avatar Nov 21 '22 14:11 jcsp