kafka
kafka copied to clipboard
KAFKA-10149: Allow auto preferred leader election when there are ongoing partition reassignments
Auto preferred leader election should be allowed for partitions which don't have any ongoing reassignments. Otherwise, it might result in unbalanced leaders if there are any long-running reassignments. For partitions being reassigned, a leader election will be triggered when the TRS = ISR as part of the partition reassignment procedure automatically if the current leader is not in TRS or isn't alive (step B3).
For example, let's say there are broker (1,2,3,4), and a partition is current on (1,2) , 1 is the current leader:
- if we reassign that partition to
(3, 4), a leader election would be triggered during the reassignment step B3, and 3 will be the new leader. - if we reassign that partition to
(3, 1), leader election won't be triggered during the reassignment as the current leader is still in the TRS, 1 will continue to be the leader. The auto preferred leader election will help in such situation.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
Update the code and commit message. It looks like partitions being reassigned can also benefit from auto preferred leader election
Should we also add a test case for when the leader election takes place on the same topic as the one undergoing reassignment?
@C0urante Added a new test case. Could you have a look? Thanks.
@C0urante thanks for reminding me about this. I think the last thing I want to confirm is that in the case where we have a reassignment in progress -- we don't do the preferred leader election. I see we added the test case, but its a bit hard to parse.
I'm also a little confused by shutting down the broker that is not part of the assignment in the beginning. Some comments there could really help me understand the scenario better.
Looks like the producer ID test didn't fail again, so it might be flaky. I will keep an eye on it. Looks like the builds are also red now, but maybe that's an issue on master.
Thanks @jolshan. I believe it's valid to perform preferred leader election for topics even if they're currently undergoing reassignment, for reasons discussed in an earlier PR for this ticket by @hachikuji:
During a reassignment, the adding replicas are always listed first which means the preferred leader is among the target replicas. My take is that we want to move the leadership onto the new preferred leader as soon as possible since the whole point of the reassignment is to take load off the removing replicas.
I've double-checked the code base and this still appears to be the case; let me know if I'm missing something, though.
Definitely agree RE commenting the test code; @songnon if you have time would you mind adding some details? Explanation of why we're shutting down and restarting which brokers when (e.g., to create a stuck reassignment, or to create a pending preferred leader election) should be plenty.
I believe it's valid to perform preferred leader election for topics even if they're currently undergoing reassignment
Got it. I think the PR description here confused me then.
Auto preferred leader election should be allowed for partitions which don't have any ongoing reassignments.
Perhaps we can update this to make it clearer that we can also do the preferred leadership election on partitions undergoing reassignment to update the preferred leader sooner.
Perhaps we can update this to make it clearer that we can also do the preferred leadership election on partitions undergoing reassignment to update the preferred leader sooner.
Fixed. I forgot to update the PR description after making some code change and updating the commit text.
I'm also a little confused by shutting down the broker that is not part of the assignment in the beginning. Some comments there could really help me understand the scenario better.
@jolshan @C0urante I have added some comments in the test code. Hope it helps. Let me know if there are any questions.
@junrao Was there an underlying reason not to allow this previously?
Previously, the controller logic wasn't single threaded. So, this check is probably just to prevent concurrent actions. Now that all logic in the controller runs in a single thread, this constraint could be revisited.