redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (timeout creating topic) in `RackAwarePlacementTest.test_replica_placement` in DEBUG builds

Open travisdowns opened this issue 2 years ago • 3 comments

https://buildkite.com/redpanda/redpanda/builds/31567#0188d102-709f-4dfa-a1bf-47647df79717

Module: rptest.tests.rack_aware_replica_placement_test
Class:  RackAwarePlacementTest
Method: test_replica_placement
Arguments:
{
  "num_partitions": 400,
  "num_topics": 2,
  "rack_layout_str": "ooooFF",
  "replication_factor": 3
}

Note: all observed failures are on debug builds, if you find a counter-example please update this bug.

Underlying cause is that topic creation fails due to timeout, I think this can only be seen in the debug log:

[DEBUG - 2023-06-19 00:48:52,926 - kafka_cli_tools - create_topic - lineno:67]: Creating topic: topic-qpvznbjaqd
[DEBUG - 2023-06-19 00:48:52,927 - kafka_cli_tools - _execute - lineno:410]: Executing command: ['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-9:9092,docker-rp-6:9092,docker-rp-19:9092,docker-rp-5:9092,docker-rp-7:9092,docker-rp-8:9092', '--create', '--topic', 'topic-qpvznbjaqd', '--partitions', '400', '--replication-factor', '3', '--config', 'cleanup.policy=delete']
[DEBUG - 2023-06-19 00:49:54,807 - kafka_cli_tools - _execute - lineno:418]: Error (1) executing command: Error while executing topic command : Call(callName=createTopics, deadlineMs=1687135794401, tries=1, nextAllowedTryMs=1687135794559) timed out at 1687135794459 after 1 attempt(s)
[2023-06-19 00:49:54,464] ERROR org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1687135794401, tries=1, nextAllowedTryMs=1687135794559) timed out at 1687135794459 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.DisconnectException: Cancelled createTopics request with correlation id 3 due to node 1 being disconnected
 (kafka.admin.TopicCommand$)

Looks like the timeout occurred about 62 seconds after issuing the command (so probably a 60 second timeout) which is a long time to create a 400 partition topic.

JIRA Link: CORE-1348

travisdowns avatar Jun 19 '23 22:06 travisdowns

Three failures over the last 24h:

FAIL test: RackAwarePlacementTest.test_replica_placement.rack_layout_str=ooooFF.num_partitions=400.replication_factor=3.num_topics=2 (3/31 runs) failure at 2023-06-19T01:19:57.960Z: CalledProcessError(1, ['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-9:9092,docker-rp-6:9092,docker-rp-19:9092,docker-rp-5:9092,docker-rp-7:9092,docker-rp-8:9092', '--create', '--topic', 'topic-qpvznbjaqd', '--partitions', '400', '--replication-factor', '3', '--config', 'cleanup.policy=delete']) on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/31567#0188d102-709f-4dfa-a1bf-47647df79717 failure at 2023-06-19T14:53:48.777Z: CalledProcessError(1, ['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-8:9092,docker-rp-23:9092,docker-rp-24:9092,docker-rp-21:9092,docker-rp-18:9092,docker-rp-22:9092', '--create', '--topic', 'topic-xvrrqvacjj', '--partitions', '400', '--replication-factor', '3', '--config', 'cleanup.policy=delete']) on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/31599#0188d3ec-265a-4582-997f-aad70ddfe771 failure at 2023-06-19T07:07:44.877Z: CalledProcessError(1, ['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-2:9092,docker-rp-1:9092,docker-rp-11:9092,docker-rp-12:9092,docker-rp-18:9092,docker-rp-19:9092', '--create', '--topic', 'topic-kieseoggdk', '--partitions', '400', '--replication-factor', '3', '--config', 'cleanup.policy=delete']) on (amd64, container) in job https://buildkite.com/redpanda/redpanda/builds/31580#0188d241-8759-487b-8fb1-dba0aae8251c

travisdowns avatar Jun 19 '23 22:06 travisdowns

This is a duplicate of https://github.com/redpanda-data/redpanda/issues/11276, which I failed to find in search the first time around.

travisdowns avatar Jun 19 '23 22:06 travisdowns

*https://buildkite.com/redpanda/redpanda/builds/31621 *https://buildkite.com/redpanda/redpanda/builds/31621 *https://buildkite.com/redpanda/redpanda/builds/31634 *https://buildkite.com/redpanda/redpanda/builds/31630 *https://buildkite.com/redpanda/redpanda/builds/31637 *https://buildkite.com/redpanda/redpanda/builds/31637 *https://buildkite.com/redpanda/redpanda/builds/31647 *https://buildkite.com/redpanda/redpanda/builds/31647 *https://buildkite.com/redpanda/redpanda/builds/31649 *https://buildkite.com/redpanda/redpanda/builds/31649 *https://buildkite.com/redpanda/redpanda/builds/31695 *https://buildkite.com/redpanda/redpanda/builds/31700 *https://buildkite.com/redpanda/redpanda/builds/31725 *https://buildkite.com/redpanda/redpanda/builds/31731 *https://buildkite.com/redpanda/redpanda/builds/31749

vbotbuildovich avatar Mar 29 '24 04:03 vbotbuildovich

Duplicate of https://github.com/redpanda-data/redpanda/issues/11276

rpdevmp avatar May 16 '24 16:05 rpdevmp