cruise-control icon indicating copy to clipboard operation
cruise-control copied to clipboard

TopicLeadershipDistributionGoal

Open jlei-nr opened this issue 4 years ago • 2 comments

This PR partially resolves #1437 (this is a hard goal vs. the requested soft goal).

What

This PR adds the TopicLeadershipDistributionGoal hard goal. It endeavors to balance partition leadership across all non-excluded brokers in a cluster as evenly as possible (each non-excluded broker should lead at most +1 partitions for each topic when compared to its peers).

Why

This is useful not only as a load-balancing measure (we leverage the assumption that all records going into any topic will have similar impact on broker resources regardless of which partition it's being produced to) but also as a way to more easily reason about Kafka producer quotas. For example, when leadership is evenly distributed on a per-topic level and all producers to that topic share the same client ID, it becomes much easier to translate a per-topic MB/s ceiling to a reasonable per-broker quota value that Kafka actually understands.

One important caveat

This is quite an opinionated goal, and may frequently conflict with the RackAwareGoal in particular. For cases where your brokers are evenly distributed amongst the available racks and there are exactly as many racks as your desired replication factor this isn't a huge issue (letting the RackAwareGoal run before the TopicLeadershipDistributionGoal should result in valid solutions being reached).

For cases where brokers aren't evenly distributed amongst the racks and there are many more racks than the desired replication factor, things get a little bit more awkward. We've temporarily solved this on our end by having the TopicLeadershipDistributionGoal run before the RackAwareGoal and adding an option to the RackAwareGoal to have it only suggest follower replica movements but we realize this means that the RackAwareGoal can no longer self-heal from dead broker situations (this is a feature we're not using at the moment). This optional "follower replicas only" functionality for the RackAwareGoal has not been included in this PR.

I'm currently expanding the RackAwareGoal to explore all follower replica movements first before trying any leader replica movements, but I think the TopicLeadershipDistributionGoal is still useful without that change.

Conclusion

Anyways, let me know what you think, and definitely feel free to hit me up with any questions/comments/concerns!

jlei-nr avatar Dec 04 '21 19:12 jlei-nr

Thanks for the contribution, @jlei-nr! I just did some research on this subject. Do you think setting min.topic.leaders.per.broker=0 with MinTopicLeadersPerBrokerGoal(#1683) would achieve the same goal as this PR #1751?

zornhsu avatar Dec 23 '21 22:12 zornhsu

Hi @jlei-nr, thanks for the proposal. Reading through the PR description, I can understand your point of adding this goal.

From my understanding, there are 2 reasons that you wanted to add this goal:

  1. load balance
  2. Better to evaluate producer quota

For the purpose 1, I think it's a valid reason to add this goal as a soft goal in open source CC. For the purpose 2, I believe it could be helpful for your use case, but it's not a strong enough reason to add it to open source CC especially as a hard goal. A hard goal should be much stronger requirement that would apply for most of the CC users.

With that said, please feel free to close this PR and keep this change internally. (As we're cleaning up open PRs, this will be very helpful. Thanks!)

Or if you want to add this goal to open source as a soft goal, we can discuss more and there might be some changes needed on top of the current commits. Please let me know what do you think. Thanks!

CCisGG avatar Aug 14 '22 19:08 CCisGG