Enhance the region balance of the 1M tables imported by lightning
Development Task
Background
- PD ensures that the table level's scatter, but it doesn't care about table-to-table, which relies on
balance-scheduler. Balance-Regionwill not schedule empty region, There is a hardcode in https://github.com/tikv/pd/blob/7e18a69734f2ae391516625ec0230c05b363f5c5/pkg/schedule/filter/region_filters.go#L150-L153- When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247
Problems faced
For lightning importing 1 million tables(one table corresponds to one region), even though there are more than 3 stores, consecutive region keys will generate a lot of regions aggregations in the first 3 stores. And since regions are not scheduled, the three stores have a high probability of OOM.
i'm interesting in the issue.
/assign @River2000i
When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247
PD schedule scatter region base on region_count in all store. And schedule new region to fewest store.
https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L346
PD will compare region_count base on the group. For now, gourp define by table ID.(every table belong to a group)
https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L367
Summary:
- If there are lots of new split empty regions(close kv range), the region will split on the same store. In
grouplevel, scatter base on theregion_countin agroup. It will not scatter region to the store , which not contain the region belongs to thegroup. - In cluster level, scatter region base on the
region_countin whole cluster. Region is balanced in whole cluster, but not balanced ingrouportablelevel.
root cause:
selectNewPeer() will pick origin store in first time trigger scatter region. Since it will compare all stores's region count in engineContext, but first time call context.selectedPeer.Get() will get 0 in all stores, so it will not scatter region in all stores. https://github.com/tikv/pd/blob/13174b5d4cab4d958f0b4ea2718ea7bb74992bc7/pkg/schedule/scatter/region_scatterer.go#L444
If we want to schedule scatter region in cluster level, we can call ScatterRegion with the same group. It will be an options for caller.