automq
automq copied to clipboard
[Enhancement] Support slow broker detect on AutoBalancer
What's the problem
When a broker experiences internal issues and has increased latency in producing or fetching data, the network bandwidth is likely to decrease. To ensure load balance within the cluster, the AutoBalancer will attempt to assign additional partitions to this broker, which can result in more partitions being affected by the failure.
How to identify slow brokers
Brokers will need to reporter additional metrics including append latency, append stream queue size, fast read latency and fetch queue size. And AutoBalancer will mark a broker as "slow" If any of these metrics show a sudden increase compared to the historical statistics
What to do with slow brokers
When a broker is marked as "slow", there will be no more additional partitions assigned to this broker. However, moving out existing partitions form it is still allowed.