pd
pd copied to clipboard
New scheduler to support `SQL-based Data Redistribution`
Feature Request
TiKV and TiFlash are distributed storage engines for TiDB. The PD can automatically schedule the data to be distributed and balanced in the cluster. However, it can only consider the whole cluster to balance all data(in the form of region) among all TiKV or TiFlash nodes. For the specified table, the regions sometimes are not balanced in the cluster.
Describe your feature request-related problem
- https://github.com/tikv/pd/issues/6951: empty region occupy store limit
- https://github.com/tikv/pd/issues/7838: operator execute failed will lead the distribution of the regions skew
- https://github.com/tikv/pd/issues/5603: the CPU usage of the scatter region is too high
- https://github.com/tikv/pd/issues/5484: scatter strategy can't work if there are many unhealthy nodes
Describe the feature you'd like
PD provides some commands to trigger schedules for data distribution.
- Implement a new SQL command to distribute data for the specified table.
DISTRIBUTE TABLE table_name [PARTITION partition_list]
[RULE = 'leader-scatter'|'peer-scatter']
[engine = 'TiKV'|'TiFlash'];
- Allow users to specify the target storage engine (TiKV or TiFlash or both) for data rebalance.
- Provide a seamless and user-friendly experience for data rebalance.
Describe alternatives you've considered
Teachability, Documentation, Adoption, Migration Strategy
a better scatter-range-scheduler?
a better scatter-range-scheduler?
Yes, but it's finally exposed to SQL mode.