redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

arm CI failure: `rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True`

Open abhijat opened this issue 2 years ago • 7 comments

There is another issue for the same test but it is very different. In this test the actual assertions passed but sometimes the upload fails to S3:

test_id:    rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True

status:     FAIL

run time:   1 minute 59.928 seconds





    <BadLogLines nodes=docker-rp-1(4),docker-rp-2(3),docker-rp-4(4),docker-rp-5(26357),docker-rp-6(3),docker-rp-7(2) example="ERROR 2022-11-10 05:52:50,692 [shard 1] archival - [fiber6 kafka/topic-llnqzwhagu/9] - ntp_archiver_service.cc:217 - Failed to upload 1 segments out of 1">

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run

    data = self.run_test()

  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test

    return self.test_context.function(self.test)

  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper

    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)

  File "/root/tests/rptest/services/cluster.py", line 67, in wrapped

    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)

  File "/root/tests/rptest/services/redpanda.py", line 1537, in raise_on_bad_logs

    raise BadLogLines(bad_lines)

rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-1(4),docker-rp-2(3),docker-rp-4(4),docker-rp-5(26357),docker-rp-6(3),docker-rp-7(2) example="ERROR 2022-11-10 05:52:50,692 [shard 1] archival - [fiber6 kafka/topic-llnqzwhagu/9] - ntp_archiver_service.cc:217 - Failed to upload 1 segments out of 1">

Bad log lines on all nodes eg:

WARN  2022-11-10 05:52:50,692 [shard 1] cloud_storage - [fiber6~0~0|8|11158ms] - remote.cc:517 - Uploading segment {"4a59c102/kafka/topic-llnqzwhagu/9_40/0-81-21238683-1-v1.log.1"} to {while-busy-bucket}, backoff quota exceded, segment not uploaded
ERROR 2022-11-10 05:52:50,692 [shard 1] archival - [fiber6 kafka/topic-llnqzwhagu/9] - ntp_archiver_service.cc:217 - Failed to upload 1 segments out of 1

the error during the request:

WARN  2022-11-10 05:52:45,047 [shard 1] s3 - client.cc:645 - S3 request failed with error: std::__1::system_error (error system:32, sendmsg: Broken pipe)
WARN  2022-11-10 05:52:45,047 [shard 1] cloud_storage - [fiber5~0~0|7|16807ms] - remote.cc:155 - System error susceptible for retry sendmsg: Broken pipe
DEBUG 2022-11-10 05:52:45,047 [shard 1] cloud_storage - [fiber5~0~0|7|16807ms] - remote.cc:496 - Uploading segment {"250af3f2/kafka/topic-llnqzwhagu/2_40/0-81-21238682-1-v1.log.1"} to {while-busy-bucket}, 7273ms backoff required

https://buildkite.com/redpanda/redpanda/builds/18328#01845fc8-e15a-4f69-9e54-1b2d848a45e4

abhijat avatar Nov 10 '22 07:11 abhijat

@abhijat can you add the link to the build where it failed if it was in CI?

Or perhaps this was clustered ducktape run by hand... I saw this log error yesterday when I was running several tests in parallel, I think it in that case it was just the S3 API starting to reject us when the number of connections was a bit higher than usual.

jcsp avatar Nov 10 '22 14:11 jcsp

Another in https://ci-artifacts.dev.vectorized.cloud/redpanda/01845ff3-176f-4ff2-a9f9-c37606100922/vbuild/ducktape/results/2022-11-10--001/ShadowIndexingWhileBusyTest/test_create_or_delete_topics_while_busy/short_retention=True/41/

NyaliaLui avatar Nov 10 '22 14:11 NyaliaLui

@abhijat can you add the link to the build where it failed if it was in CI?

sure, added, it was a manual build triggered on buildkite. Missed adding it earlier

abhijat avatar Nov 10 '22 15:11 abhijat

Another in https://ci-artifacts.dev.vectorized.cloud/redpanda/0184656d-a0d0-4041-af74-f30a8c0e2b54/vbuild/ducktape/results/2022-11-11--001/ShadowIndexingWhileBusyTest/test_create_or_delete_topics_while_busy/short_retention=True/43/

NyaliaLui avatar Nov 11 '22 15:11 NyaliaLui

Another in https://buildkite.com/redpanda/redpanda/builds/18532#0184748d-5e11-4564-9b69-9c2a4ba0c5f7

andijcr avatar Nov 14 '22 14:11 andijcr

another https://buildkite.com/redpanda/redpanda/builds/18587#018479bc-e5b7-4777-a76d-322555029e92

andijcr avatar Nov 15 '22 15:11 andijcr

https://buildkite.com/redpanda/redpanda/builds/18763#01848400-7735-468a-beee-2e987d486c36

andijcr avatar Nov 17 '22 15:11 andijcr

4 failures on arm: failure at 2022-11-21T07:40:44.803Z: <BadLogLines nodes=docker-rp-1(3),docker-rp-2(8),docker-rp-3(1),docker-rp-4(1),docker-rp-5(3),docker-rp-6(3),docker-rp-7(30361) example="ERROR 2022-11-21 06:26:24,073 [shard 0] archival - [fiber4 kafka/topic-gumjfsukwb/6] - ntp_archiver_service.cc:220 - Failed to upload 3 segments out of 3"> in job https://buildkite.com/redpanda/redpanda/builds/18848#018498a2-1792-4269-af80-9d74661e8a43 failure at 2022-11-21T08:01:31.032Z: <BadLogLines nodes=docker-rp-2(22),docker-rp-3(21),docker-rp-4(13),docker-rp-5(11),docker-rp-6(1),docker-rp-7(10) example="ERROR 2022-11-21 06:47:06,096 [shard 1] cloud_storage - [fiber4~116~18~0|1|59998ms] - remote.cc:104 - Accessing {while-busy-bucket}, unexpected REST API error "" detected, code: NoSuchBucket, request_id: 172986AFB2E8ACB9, resource: /25487b20/kafka/topic-dxbxhjruoo/7_40/9482-9563-20977592-2-v1.log.2"> in job https://buildkite.com/redpanda/redpanda/builds/18850#018498b4-8859-4f40-a46b-cbb4c2774912 failure at 2022-11-20T07:52:52.860Z: <BadLogLines nodes=docker-rp-2(47),docker-rp-3(30),docker-rp-4(144),docker-rp-6(10),docker-rp-7(23) example="ERROR 2022-11-20 06:38:02,311 [shard 1] cloud_storage - [fiber4~121~6~0|1|59998ms] - remote.cc:104 - Accessing {while-busy-bucket}, unexpected REST API error "" detected, code: NoSuchBucket, request_id: 1729379C857719CA, resource: /743e1ed4/kafka/topic-msiowcmjkn/0_40/9673-9755-21500661-1-v1.log.1"> in job https://buildkite.com/redpanda/redpanda/builds/18842#01849385-ef55-4d3b-aa1b-bd7d84e660e8 failure at 2022-11-19T07:28:36.045Z: <BadLogLines nodes=docker-rp-1(2),docker-rp-2(3),docker-rp-3(3),docker-rp-4(2),docker-rp-5(3),docker-rp-6(25626) example="ERROR 2022-11-19 06:14:03,149 [shard 1] archival - [fiber5 kafka/topic-ymcvyegjdk/8] - ntp_archiver_service.cc:220 - Failed to upload 1 segments out of 1"> in job https://buildkite.com/redpanda/redpanda/builds/18826#01848e44-e793-46fb-aa24-3364f6c3d519

jcsp avatar Nov 21 '22 12:11 jcsp

FAIL test: ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True (2/47 runs) failure at 2022-11-23T07:42:59.801Z: <BadLogLines nodes=docker-rp-1(12),docker-rp-4(10),docker-rp-5(10),docker-rp-6(2) example="ERROR 2022-11-23 06:27:08,880 [shard 0] cloud_storage - [fiber2~108~10~0|1|59998ms] - remote.cc:104 - Accessing {while-busy-bucket}, unexpected REST API error "" detected, code: NoSuchBucket, request_id: 172A22C215D3A708, resource: /43194a6b/kafka/topic-ovddsmqmuc/5_40/7216-7297-21238423-1-v1.log.1"> on (arm64, container) in job https://buildkite.com/redpanda/redpanda/builds/18961#0184a2ee-dda7-4e64-adb8-e53c6f012e3b failure at 2022-11-23T07:31:49.734Z: <BadLogLines nodes=docker-rp-1(56),docker-rp-3(116),docker-rp-4(186),docker-rp-5(380),docker-rp-6(800),docker-rp-7(70) example="ERROR 2022-11-23 06:17:22,570 [shard 1] cloud_storage - [fiber4~120~10~0|1|59998ms] - remote.cc:104 - Accessing {while-busy-bucket}, unexpected REST API error "" detected, code: NoSuchBucket, request_id: 172A223993086CF1, resource: /654a1fb1/kafka/topic-lqkxoxpsvo/3_40/8331-8411-21238227-1-v1.log.1"> on (arm64, container) in job https://buildkite.com/redpanda/redpanda/builds/18960#0184a2e5-b302-40c3-bf26-467c83070c18

jcsp avatar Nov 23 '22 19:11 jcsp

fixing here https://github.com/redpanda-data/redpanda/pull/7471

jcsp avatar Nov 23 '22 19:11 jcsp