redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

AlterPartitionAssignments does not change the leader

Open freef4ll opened this issue 2 years ago • 6 comments
trafficstars

Version & Environment

$ rpk version
v23.1.1-rc6 (rev dc47c26)

What went wrong?

Given the leaders:

$ rpk topic create test2_replica3_parition6 --partitions 6 --replicas 3

$ rpk topic describe -p test2_replica3_parition6
PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
0          0       23     [0 1 2]   0                 0
1          0       18     [0 1 2]   0                 0
2          1       21     [0 1 2]   0                 0
3          2       22     [0 1 2]   0                 0
4          2       10     [0 1 2]   0                 0
5          2       13     [0 1 2]   0                 0

And making broker 2 the leader for first 3 partitions, and then 0 leader of the remaining:

$ cat partition_replica.json 
{
  "version": 1,
  "partitions": [
    {
      "topic": "test2_replica3_parition6",
      "partition": 0,
      "replicas": [2,0,1]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 1,
      "replicas": [2,0,1]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 2,
      "replicas": [2,0,1]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 3,
      "replicas": [0,1,2]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 4,
      "replicas": [0,1,2]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 5,
      "replicas": [0,1,2]
    }
  ]
}
$  cat partition_replica_inverse.json
{
  "version": 1,
  "partitions": [
    {
      "topic": "test2_replica3_parition6",
      "partition": 0,
      "replicas": [0,1,2]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 1,
      "replicas": [0,1,2]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 2,
      "replicas": [0,1,2]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 3,
      "replicas": [2,0,1]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 4,
      "replicas": [2,0,1]
    },
    {
      "topic": "test2_replica3_parition6",
      "partition": 5,
      "replicas": [2,0,1]
    }
  ]
}

Produces:

$ bin/kafka-reassign-partitions.sh --bootstrap-server 192.168.0.5:9092,192.168.0.6:9092,192.168.0.7:9092 --reassignment-json-file partition_replica.json --execute
Current partition replica assignment

{"version":1,"partitions":[{"topic":"test2_replica3_parition6","partition":0,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test2_replica3_parition6","partition":1,"replicas":[0,2,1],"log_dirs":["any","any","any"]},{"topic":"test2_replica3_parition6","partition":2,"replicas":[0,2,1],"log_dirs":["any","any","any"]},{"topic":"test2_replica3_parition6","partition":3,"replicas":[0,1,2],"log_dirs":["any","any","any"]},{"topic":"test2_replica3_parition6","partition":4,"replicas":[2,0,1],"log_dirs":["any","any","any"]},{"topic":"test2_replica3_parition6","partition":5,"replicas":[0,2,1],"log_dirs":["any","any","any"]}]}

Save this to use as the --reassignment-json-file option during rollback
Successfully started partition reassignments for test2_replica3_parition6-0,test2_replica3_parition6-1,test2_replica3_parition6-2,test2_replica3_parition6-3,test2_replica3_parition6-4,test2_replica3_parition6-5

Only 5 changed leadership:

$ rpk topic describe -p test2_replica3_parition6
PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
0          0       24     [0 1 2]   0                 0
1          0       19     [0 1 2]   0                 0
2          1       22     [0 1 2]   0                 0
3          2       23     [0 1 2]   0                 0
4          2       10     [0 1 2]   0                 0
5          0       14     [0 1 2]   0                 0

The operations are rejected:

Feb 22 10:12:18 mini2 rpk[1234292]: INFO  2023-02-22 10:12:18,767 [shard 2] cluster - controller_backend.cc:778 - [{kafka/test2_replica3_parition6/5}] (retry 2) result: Current node is not a leader for partition operation: {type: update, revision: 276, assignment: { id: 5, group_id: 262, replicas: {{node_id: 2, shard: 2}, {node_id: 0, shard: 2}, {node_id: 1, shard: 2}} }, previous assignment: {{{node_id: 0, shard: 3}, {node_id: 2, shard: 3}, {node_id: 1, shard: 3}}}}
Feb 22 10:12:18 mini2 rpk[1234292]: INFO  2023-02-22 10:12:18,774 [shard 1] cluster - controller_backend.cc:778 - [{kafka/test2_replica3_parition6/1}] (retry 2) result: Current node is not a leader for partition operation: {type: update, revision: 272, assignment: { id: 1, group_id: 258, replicas: {{node_id: 0, shard: 0}, {node_id: 2, shard: 1}, {node_id: 1, shard: 1}} }, previous assignment: {{{node_id: 0, shard: 2}, {node_id: 2, shard: 3}, {node_id: 1, shard: 3}}}}
Feb 22 10:12:18 mini2 rpk[1234292]: INFO  2023-02-22 10:12:18,775 [shard 1] cluster - controller_backend.cc:778 - [{kafka/test2_replica3_parition6/3}] (retry 2) result: Current node is not a leader for partition operation: {type: update, revision: 274, assignment: { id: 3, group_id: 260, replicas: {{node_id: 0, shard: 0}, {node_id: 2, shard: 1}, {node_id: 1, shard: 1}} }, previous assignment: {{{node_id: 0, shard: 2}, {node_id: 1, shard: 3}, {node_id: 2, shard: 3}}}}

And changing an inverse of leadership:

bin/kafka-reassign-partitions.sh --bootstrap-server 192.168.0.5:9092,192.168.0.6:9092,192.168.0.7:9092 --reassignment-json-file partition_replica_inverse.json --execute

What should have happened instead?

Membership should have changed.

JIRA Link: CORE-1176

freef4ll avatar Feb 22 '23 10:02 freef4ll

Hi @freef4ll,

If you want to change leadership, we recommend the use of the Admin API. For example, to change leadership for partition 0 to node 2:

curl -X POST "http://192.168.0.5:9644/v1/partitions/kafka/test2_replica3_parition6/0/transfer_leadership?target=2" 

Changing leadership with the AlterPartitionReassignments API in Redpanda is unsupported and needs to be documented. Thank you for flagging this.

NyaliaLui avatar Feb 22 '23 16:02 NyaliaLui

To add a little more context to this:

Redpanda does not have sticky leadership in the same way that Kafka does, and will continuously attempt to maintain leadership balance across the cluster. Therefore having the same leadership semantics as Kafka for alter partitions wouldn't accomplish much. You could disable the leadership balancer and manually move leadership (like @NyaliaLui mentioned above), but this can be risky because leadership imbalance may still occur.

However, in our next release 23.2 we plan to have sticky leadership integrated into the system at which point we'll pass through the knowledge of leader-is-first-in-replica-set from alter partition reassignments into the balancer.

dotnwat avatar Feb 22 '23 17:02 dotnwat

@micheleRP this is beta feedback that should be documented. See @NyaliaLui 's comment above

mattschumpert avatar Feb 23 '23 23:02 mattschumpert

Thanks guys! It would be worth while to clarify the leadership change under https://github.com/redpanda-data/documentation/issues/606

freef4ll avatar Feb 24 '23 07:02 freef4ll

Thanks. Adding this with https://github.com/redpanda-data/documentation/pull/1309

micheleRP avatar Mar 01 '23 21:03 micheleRP

This issue hasn't seen activity in 3 months. If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

github-actions[bot] avatar Aug 22 '24 06:08 github-actions[bot]

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.

github-actions[bot] avatar Sep 05 '24 06:09 github-actions[bot]