cruise-control
cruise-control copied to clipboard
TopicLeadershipDistributionGoal
This PR partially resolves #1437 (this is a hard goal vs. the requested soft goal).
What
This PR adds the TopicLeadershipDistributionGoal hard goal. It endeavors to balance partition leadership across all non-excluded brokers in a cluster as evenly as possible (each non-excluded broker should lead at most +1 partitions for each topic when compared to its peers).
Why
This is useful not only as a load-balancing measure (we leverage the assumption that all records going into any topic will have similar impact on broker resources regardless of which partition it's being produced to) but also as a way to more easily reason about Kafka producer quotas. For example, when leadership is evenly distributed on a per-topic level and all producers to that topic share the same client ID, it becomes much easier to translate a per-topic MB/s ceiling to a reasonable per-broker quota value that Kafka actually understands.
One important caveat
This is quite an opinionated goal, and may frequently conflict with the RackAwareGoal in particular. For cases where your brokers are evenly distributed amongst the available racks and there are exactly as many racks as your desired replication factor this isn't a huge issue (letting the RackAwareGoal run before the TopicLeadershipDistributionGoal should result in valid solutions being reached).
For cases where brokers aren't evenly distributed amongst the racks and there are many more racks than the desired replication factor, things get a little bit more awkward. We've temporarily solved this on our end by having the TopicLeadershipDistributionGoal run before the RackAwareGoal and adding an option to the RackAwareGoal to have it only suggest follower replica movements but we realize this means that the RackAwareGoal can no longer self-heal from dead broker situations (this is a feature we're not using at the moment). This optional "follower replicas only" functionality for the RackAwareGoal has not been included in this PR.
I'm currently expanding the RackAwareGoal to explore all follower replica movements first before trying any leader replica movements, but I think the TopicLeadershipDistributionGoal is still useful without that change.
Conclusion
Anyways, let me know what you think, and definitely feel free to hit me up with any questions/comments/concerns!
Thanks for the contribution, @jlei-nr!
I just did some research on this subject. Do you think setting min.topic.leaders.per.broker=0 with MinTopicLeadersPerBrokerGoal(#1683) would achieve the same goal as this PR #1751?
Hi @jlei-nr, thanks for the proposal. Reading through the PR description, I can understand your point of adding this goal.
From my understanding, there are 2 reasons that you wanted to add this goal:
- load balance
- Better to evaluate producer quota
For the purpose 1, I think it's a valid reason to add this goal as a soft goal in open source CC. For the purpose 2, I believe it could be helpful for your use case, but it's not a strong enough reason to add it to open source CC especially as a hard goal. A hard goal should be much stronger requirement that would apply for most of the CC users.
With that said, please feel free to close this PR and keep this change internally. (As we're cleaning up open PRs, this will be very helpful. Thanks!)
Or if you want to add this goal to open source as a soft goal, we can discuss more and there might be some changes needed on top of the current commits. Please let me know what do you think. Thanks!