tidb icon indicating copy to clipboard operation
tidb copied to clipboard

Region is not distributed after plug a new server

Open phamtai97 opened this issue 2 years ago • 3 comments

Hi all, I have a problem related to distribution data in the cluster after i plugged new servers into the current cluster. I have two datacenter in one city. Firstly, i only ran one datacenter (id: 1) with 5 servers. After that I plugged in a new datacenter (id: 2) with 3 servers. I set up leaders to only place in the datacenter 1, so servers of datacenter 2 have no leader. Then, I monitored and saw that data was rebalanced from datacenter 1 to datacenter 2. But there is a problem. Can see below image:

Screen Shot 2022-08-10 at 10 16 54

We can easily see node 1.0.1.20 (node in datacenter 1) and node 1.0.0.23 (node in datacenter 2) has more region score than other nodes, although leaders is balance between nodes in cluster.

Before I plugged servers of datacenter 2 to a TiDB cluster of datacenter 1, servers of datacenter 1 have the same region and data size.

Screen Shot 2022-08-10 at 12 43 38

Besides, we also triggered the rebalance region but it did not resolve the problem.

My config here:

» config show
{
  "replication": {
    "enable-placement-rules": "true",
    "enable-placement-rules-cache": "false",
    "isolation-level": "dc",
    "location-labels": "zone,dc,rack,host",
    "max-replicas": 5,
    "strictly-match-label": "false"
  },
  "schedule": {
    "enable-cross-table-merge": "true",
    "enable-joint-consensus": "true",
    "high-space-ratio": 0.7,
    "hot-region-cache-hits-threshold": 2,
    "hot-region-schedule-limit": 8,
    "hot-regions-reserved-days": 7,
    "hot-regions-write-interval": "10m0s",
    "leader-schedule-limit": 4,
    "leader-schedule-policy": "size",
    "low-space-ratio": 0.8,
    "max-merge-region-keys": 200000,
    "max-merge-region-size": 20,
    "max-pending-peer-count": 64,
    "max-snapshot-count": 64,
    "max-store-down-time": "30m0s",
    "merge-schedule-limit": 8,
    "patrol-region-interval": "10ms",
    "region-schedule-limit": 2048,
    "region-score-formula-version": "v2",
    "replica-schedule-limit": 64,
    "split-merge-interval": "1h0m0s",
    "tolerant-size-ratio": 20
  }
}

I think that PD seems to wrongly calculate region score lead to unbalance region score.

I have some questions:

  • What is happened problem?
  • Why does it happen?
  • How to balance data in the cluster? How to fix it?

Thank you.

phamtai97 avatar Aug 10 '22 03:08 phamtai97

@tiancaiamao Can you help me?

phamtai97 avatar Aug 10 '22 03:08 phamtai97

@tiancaiamao Can you help me?

You can ask question here https://github.com/tikv/pd Region scheduling is pd's work, which I'm not familiar with.

tiancaiamao avatar Aug 10 '22 06:08 tiancaiamao

Tks @tiancaiamao

phamtai97 avatar Aug 10 '22 09:08 phamtai97