rpk: add transfer-leadership command
Fixes https://github.com/redpanda-data/redpanda/issues/17418
In this implementation, I decided to make the functionality simple for a start. In #17418, I reckoned to support --partition and --from options, but only --partition flag is supported at this moment.
Here's the help text:
This command allows you to transfer leaderships for given partitions.
To move a leadership, use the following syntax:
rpk cluster partitions transfer-leadership foo --partition 0:2
Here, the command transfers the leadership for the partition "kafka/foo/0"
to broker 2. By default, it assumes the "kafka" namespace, but you can specify
an internal namespace using the "{namespace}/" prefix.
The following command transfers the leadership for the partition "kafka/foo/0"
to broker 2 and the leadership for the partition "kafka/bar/1" to broker 4
rpk cluster partitions transfer-leadership --partition foo/0:2 -p bar/1:4
Caveat: Redpanda tries to balance leadership distribution across brokers by default.
If the distribution of leaders becomes uneven as a result of transferring leadership
across brokers, the cluster will possiblly move the leaderships back to the original
brokers automatically.
Backports Required
- [ ] none - not a bug fix
- [ ] none - this is a backport
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [x] v24.1.x
- [x] v23.3.x
- [ ] v23.2.x
Release Notes
Features
- rpk: ability to transfer partition leadership
new failures in https://buildkite.com/redpanda/redpanda/builds/48171#018f0b9f-17b5-4007-9596-32a27c1f26d1:
"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"
new failures in https://buildkite.com/redpanda/redpanda/builds/48171#018f0ba6-6b52-4814-a0a6-58db8767d3ca:
"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"
new failures in https://buildkite.com/redpanda/redpanda/builds/48204#018f0f29-922e-4b1f-a749-e757ca527282:
"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"
new failures in https://buildkite.com/redpanda/redpanda/builds/48204#018f0f53-d66e-44d3-9f6d-97d9f7cf1b4d:
"rptest.tests.rbac_upgrade_test.UpgradeMigrationCreatingDefaultRole.test_rbac_migration"
new failures in https://buildkite.com/redpanda/redpanda/builds/48621#018f3a1c-e49f-4ef8-a13d-f42b117c3fe2:
"rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_cancelling_partition_move.replication_factor=3.unclean_abort=True.recovery=restart_recovery.compacted=True"
new failures in https://buildkite.com/redpanda/redpanda/builds/48805#018f5632-48c8-40a0-8314-ff633b15366d:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"
new failures in https://buildkite.com/redpanda/redpanda/builds/48805#018f5632-48cb-4f30-9bfe-7502eedf3233:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
new failures in https://buildkite.com/redpanda/redpanda/builds/48805#018f563b-1a0e-40f0-8871-c00e20b7f53f:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
new failures in https://buildkite.com/redpanda/redpanda/builds/48805#018f563b-1a0c-4d58-ad9f-512b1a8cd295:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"
Revised the help text a bit.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48204#018f0f29-9229-4fed-803d-a92d42c5ac84
Thanks for the PR, to understand:
how often is the case where we want to move multiple partitions simultaneously? is the normal scenario running this request multiple times in a row, or is it often just 1~2 leadership transfers?
My question is because I see that we have the flexibility, but I don't want the flexibility to become a complex command. If this is a command that it's likely to be a "one-of", something like:
$ rpk cluster partitions transfer-leadership [namespace/topic/partition] --to-node 2
But if it's something that usually comes in multiples, I like your:
rpk cluster partitions transfer-leadership -p foo/0:2 -p bar/1:4 ... -p <namespace>/<topic>/<partition>:<target-node-id>
And make this the only way to do it, even for 1 movement:
rpk cluster partitions transfer-leadership -p foo/0:2
That way our users only have to "learn" one command syntax and it's easier to migrate if the command grows in the future.
What do you think @daisukebe @twmb
Thanks for the question, @r-vasquez. Let me discuss with the field team and get back to you.
Discussed with the support team and confirmed we rarely transfer multiple partition leaderships at a time these days. Some want multiple but so far we prefer less error-prone. That is, I will update the command to accept only one partition at a time.
Syntax-wise, I kinda feel using -p <namespace>/<topic>/<partition>:<node> and [TOPIC] -p <partition>:<node> is consistent to the existing move command. Wdyt?
You are right, yes, let's keep it to be consistent with the move command. Thanks for double-checking with the field team @daisukebe :+1: .
Allowed to transfer one partition leadership at a time
Thanks @Feediver1! Committed all suggestions and rebased.
/backport v24.1.x
/backport v23.3.x