cruise-control icon indicating copy to clipboard operation
cruise-control copied to clipboard

Proposing a cluster wide round robin distribution goal which equally distribute everything (including rack balancing)

Open HenryCaiHaiying opened this issue 3 years ago • 0 comments

I am proposing the following cluster wide equal distribution goal to the community to see whether people thinks it belong to cruise control philosophy. We have many dedicated Kafka clusters which only has a few big topics. We already sized the cluster in the sense that the number of partition of the topic is the multiple of the number of broker nodes. So we can easily equalize the broker load if we distribute replicas of a topic equally to each broker. But we also need to make sure the distribution is equally between racks (in our case, AWS AZs). We looked at the existing goals in CC, we don't think any of them are doing both (equal leader replicas for each topic, equal follower replicas for each topic, and the same equality between AZs).

I think I can easily implement the following cluster wide equal distribution algorithm, but I don't know whether it belongs to cruise control family since it is not doing the traditional CC way (work incrementally for each broker to see whether it can shift some replicas, and to verify whether proposals from other goals violates your goal). This algorithm itself is simple and deterministic and very easy to reason about, and it doesn't want to work with any of the other goals (and it will reject all the proposals from other goals):

/*
 * Distribute replicas per topic evenly in round robin fashion.
 * The algorithm also makes the distribution evenly across racks.
 * 
 * If the number of partitions of the topic is the multiple of the number of 
 * brokers, the goal will achieve even distribution for both the leader
 * as well as the followers across brokers for the given topic.
 *
 * If the number of partitions of the topic is also the multiple of the number
 * of racks, the goal will also achieve even distribution across the racks. 
 *
 * The distrubition is based on round-robin fashion: 
 * (In the following example, we assume the Kafka cluster has 18 brokers evenly 
 * distributed in three rack/AZs.  There are two topics with replication
 * factor 3.  Topic 1 has 36 partitions, topic 2 has 54 partitions.)
 * (For terminology, we used AZ (Availabity Zone) and Rack interchangeably)
 *
 * 1. Align the racks on the AZ/Rack number line in a loop around fashion
 *    Below example use AZ (availability zone) represent a rack and we have
 *    three AZs in the cluster.
 *     Pos:   1   2   3   4   5   6   ... ... 16  17  18
 *     AZ:    AZ1 AZ2 AZ3 AZ1 AZ2 AZ3 ... ... AZ1 AZ2 AZ3
 * 2. For the AZ slot on the number line, fill in the broker with the matching rack
 *    in a round-robin fashion (B1/B4/B7 are in AZ1, B2/B5/B8 are in AZ2 ...):
 *     B:     B1  B2  B3  B4  B5  B6  ... ... B16 B17 B18
 * 3a. For Topic T1, fill in the leader partition starting position 1 in a
 *    round-robin looparound fashion:
 *     T1L:   P1  P2  P3  P4  P5  P6  ... ... P16 P17 P18
 *            P19 P20 P21 ... ... ... ... ... P34 P35 P36
 * 3b. For Topic T1, fill in the first follower partition starting position 2 in a
 *    round-robin looparound fashion:
 *     T1F1:      P1  P2  P3  P4  P5  ... ... P15 P16 P17
 *            P18 P19 P20 ... ... ... ... ... P33 P34 P35
 *            P36
 * 3b. For Topic T1, fill in the second follower partition starting position 3 in a
 *    round-robin looparound fashion:
 *     T1F2:          P1  P2  P3  P4  ... ... P14 P15 P16
 *            P17 P18 P19 ... ... ... ... ... P32 P33 P34
 *            P35 P36
 * 4.  Repeat the same algorithm for topic T2.
 *
 * In the end, we will get this distribution
 *     Pos:   1   2   3   4   5   6   ... ... 16  17  18
 *     AZ:    AZ1 AZ2 AZ3 AZ1 AZ2 AZ3 ... ... AZ1 AZ2 AZ3
 *     T1L:   P1  P2  P3  P4  P5  P6  ... ... P16 P17 P18
 *            P19 P20 P21 ... ... ... ... ... P34 P35 P36
 *     T1F1:  P36 P1  P2  P3  P4  P5  ... ... P15 P16 P17
 *            P18 P19 P20 ... ... ... ... ... P33 P34 P35
 *     T1F2:  P35 P36 P1  P2  P3  P4  ... ... P14 P15 P16
 *            P17 P18 P19 ... ... ... ... ... P32 P33 P34
 *     T2L:   P1  P2  P3  P4  P5  P6  ... ... P16 P17 P18
 *            P19 P20 P21 ... ... ... ... ... P34 P35 P36
 *            P37 P38 P39 ... ... ... ... ... P52 P53 P54
 *     T2F1:  P54 P1  P2  P3  P4  P5  ... ... P15 P16 P17
 *            P18 P19 P20 ... ... ... ... ... P33 P34 P35
 *            P36 ... ... ... ... ... ... ... ... ... P53
 *     T2F2:  P53 P54 P1  P2  P3  P4  ... ... P14 P15 P16
 *            P17 P18 P19 ... ... ... ... ... P32 P33 P34
 *            P35 ... ... ... ... ... ... ... ... ... P52
 * 
 * The load is evenly distributed on each broker on all the following
 * dimensions:
 *    1. Number of replicas
 *    2. Number of leader replicas
 * The load is also evenly distributed on each AZ for the above
 * dimensions.
 *
 * In the case that the number of partitions of a topic is not the exact
 * multiple of the number of brokers (e.g. a topic with 8 partitions 
 * distributed into 3 brokers), the round-robin algorithm will still work
 * but it will leave the load more heavily distributed into the first 
 * few brokers.  To compensate for the skew, we can start the distribution
 * of the next topic at where the first topic ends (position 3) rather
 * than at the position 1.
 */

HenryCaiHaiying avatar Aug 26 '22 21:08 HenryCaiHaiying