redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (Failed to create partitions because partition reassignment in progress) in `PartitionBalancerTest.test_fuzz_admin_ops`

Open vbotbuildovich opened this issue 1 year ago • 4 comments

https://buildkite.com/redpanda/redpanda/builds/48012

Module: rptest.tests.partition_balancer_test
Class: PartitionBalancerTest
Method: test_fuzz_admin_ops
test_id:    PartitionBalancerTest.test_fuzz_admin_ops
status:     FAIL
run time:   115.552 seconds

KafkaCliToolsError("KafkaCliTools create_topic_partitions failed (Command '['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-3:9092,docker-rp-8:9092,docker-rp-1:9092,docker-rp-2:9092', '--alter', '--topic', 'fuzzy-operator-9847-thhsqp', '--partitions', '6']' returned non-zero exit status 1.). Full stderr/stdout in debug log. Last error: ERROR org.apache.kafka.common.errors.ReassignmentInProgressException: A partition reassignment is in progress.", CalledProcessError(1, ['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-3:9092,docker-rp-8:9092,docker-rp-1:9092,docker-rp-2:9092', '--alter', '--topic', 'fuzzy-operator-9847-thhsqp', '--partitions', '6']))
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 103, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 620, in test_fuzz_admin_ops
    self.admin_fuzz.ensure_progress()
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 832, in ensure_progress
    wait_until(check,
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 53, in wait_until
    raise e
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/utils/util.py", line 44, in wait_until
    if condition():
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 816, in check
    raise self.error
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 682, in thread_loop
    self.execute_one()
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 743, in execute_one
    raise e
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 724, in execute_one
    if self.execute_with_retries(op_type, op):
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 751, in execute_with_retries
    return op.execute(self.operation_ctx)
  File "/root/tests/rptest/services/admin_ops_fuzzer.py", line 308, in execute
    cli.create_topic_partitions(self.topic, self.total)
  File "/root/tests/rptest/clients/kafka_cli_tools.py", line 156, in create_topic_partitions
    return self._run("kafka-topics.sh",
  File "/root/tests/rptest/clients/kafka_cli_tools.py", line 474, in _run
    return self._execute(cmd, desc=desc if desc else "unknown")
  File "/root/tests/rptest/clients/kafka_cli_tools.py", line 505, in _execute
    raise KafkaCliToolsError(
rptest.clients.kafka_cli_tools.KafkaCliToolsError: KafkaCliTools create_topic_partitions failed (Command '['/opt/kafka-3.0.0/bin/kafka-topics.sh', '--bootstrap-server', 'docker-rp-3:9092,docker-rp-8:9092,docker-rp-1:9092,docker-rp-2:9092', '--alter', '--topic', 'fuzzy-operator-9847-thhsqp', '--partitions', '6']' returned non-zero exit status 1.). Full stderr/stdout in debug log. Last error: ERROR org.apache.kafka.common.errors.ReassignmentInProgressException: A partition reassignment is in progress.

JIRA Link: CORE-2520

vbotbuildovich avatar Apr 22 '24 22:04 vbotbuildovich

Probably a test issue - test should be tolerant to failures that are expected

michael-redpanda avatar Apr 25 '24 17:04 michael-redpanda

@michael-redpanda I see that the error response was introduced in https://github.com/redpanda-data/redpanda/pull/14270. TBH it doesn't feel right to me that we are returning an error here for the sole reason that kafka does so (due to internal limitation, even though redpanda is perfectly capable of fulfilling the request). Is this our compatibility policy?

ztlpn avatar Apr 29 '24 11:04 ztlpn

@michael-redpanda I see that the error response was introduced in #14270. TBH it doesn't feel right to me that we are returning an error here for the sole reason that kafka does so (due to internal limitation, even though redpanda is perfectly capable of fulfilling the request). Is this our compatibility policy?

That's a good question. Not sure what prompted #8880 to get created and addressed. IMO limiting Redpanda's functionality to be compat is wrong (vs returning the wrong error code for compat reasons).

I guess my question back is, even if RP is capable of adding partitions to a topic during a reassignment operation, do we want to permit that? Could this screw up anything wrt balancing?

michael-redpanda avatar Apr 29 '24 12:04 michael-redpanda

Could this screw up anything wrt balancing?

It is not that different from adding a new topic, the balancer should be prepared for it.

ztlpn avatar Apr 29 '24 13:04 ztlpn

Based on the discussion above, the fix is to remove the check introduced in https://github.com/redpanda-data/redpanda/pull/14270, so something within the enterprise team remit.

ztlpn avatar Jun 06 '24 13:06 ztlpn

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

github-actions[bot] avatar Sep 16 '24 06:09 github-actions[bot]