Results 308 comments of Travis Downs

Three failures over the last 24h: FAIL test: RackAwarePlacementTest.test_replica_placement.rack_layout_str=ooooFF.num_partitions=400.replication_factor=3.num_topics=2 (3/31 runs) failure at 2023-06-19T01:19:57.960Z: CalledProcessError(1, ['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-9:9092,docker-rp-6:9092,docker-rp-19:9092,docker-rp-5:9092,docker-rp-7:9092,docker-rp-8:9092', '--create', '--topic', 'topic-qpvznbjaqd', '--partitions', '400', '--replication-factor', '3', '--config', 'cleanup.policy=delete']) on (amd64, container)...

This is a duplicate of https://github.com/redpanda-data/redpanda/issues/11276, which I failed to find in search the first time around.

Happened here: https://buildkite.com/redpanda/redpanda/builds/44428#018d55e9-a12f-4d21-a4ba-37b09988b9d6

Here's the relevant part of the test log: ``` [DEBUG - 2024-01-29 16:58:58,694 - kafka_cli_tools - create_topic - lineno:102]: Creating topic: topic-vzgaqgtmlw [DEBUG - 2024-01-29 16:58:58,694 - kafka_cli_tools - _execute...

Based on the logs the nodes are in a bad state, perhaps especially rp-12: full of RPC timeouts. From the logs the client timeout occurs after 1 minute (as expected)...

This log line shows up: ``` TRACE 2024-01-29 16:59:55,735 [shard 0:main] cluster - controller_api.cc:176 - getting reconciliation state for {kafka/topic-hhftbmhjkt/262} ``` This means we got at least to the `wait_for_topics`...

See also https://github.com/redpanda-data/redpanda/issues/19959

Primary issue: https://github.com/redpanda-data/redpanda/issues/19959

See also https://github.com/redpanda-data/redpanda/issues/19959

Primary issue for investigation: https://github.com/redpanda-data/redpanda/issues/19959