pd icon indicating copy to clipboard operation
pd copied to clipboard

balance-learner-schedule: balance learners among stores(tiflash).

Open AndreMouche opened this issue 1 year ago • 0 comments

Feature Request

Describe your feature request related problem

Describe the feature you'd like

Currently,

  • for the distribution issue of leaders among stores, we have the balance-leader-scheduler to address the uniformity issue.
  • for the distribution issue of peers among stores, we have the balance-region-schedule to address the uniformity issue.

while balance-region-schedule only consider the balance between all stores, without consider the roles(learner and follower) and the type of stores(tikv or tiflash ).
However, for TiFlash users, if the distribution of regions(learners) among TiFlash instances becomes unbalanced, it may lead to computational hotspots that slow down performance. From the following logic, we can see balance-region-scheduler choose the source store order by region-score , and if the number of learner region on tiflash is small, the region-score of tiflash node should be always the smallest, which makes tiflash nodes could never get the chance to run balance-region, that leads the imbalance of peers on the tiflash nodes.

https://github.com/tikv/pd/blob/fca469ca33eb5d8b5e0891b507c87709a00b0e81/pkg/schedule/schedulers/balance_region.go#L139-L144

In summary,I think we need a scheduler similar to balance-learner-scheduler to balance the distribution of learner nodes among the stores.

AndreMouche avatar Jun 12 '24 07:06 AndreMouche