pd icon indicating copy to clipboard operation
pd copied to clipboard

TiKV region balancing does not work

Open conanoc opened this issue 2 months ago • 17 comments

Bug Report

What did you do?

  1. Create a TiKV cluster on k8s. tikv storage size is 2Ti with 30 replicas.
  2. Insert 10TB of data.
  3. Wait until leader/region balancing finishes.
  4. Insert another 10TB of data

What did you expect to see?

Leader/region balancing works.

What did you see instead?

Leader balancing worked, but region balancing did not work.

Image

PD panel shows that there are 10 LowSpace stores.

Image

Many stores have enough available storage.

Image

What version of PD are you using (pd-server -V)?

v8.5.2

conanoc avatar Oct 22 '25 06:10 conanoc

Can you give me the clinic or the pd->balance metrics well?

bufferflies avatar Oct 22 '25 06:10 bufferflies

Here's the pd->balance panel. I don't know what "clinic" is.

Image Image

conanoc avatar Oct 22 '25 07:10 conanoc

Here's the pd->balance panel. I don't know what "clinic" is.

Image Image

you can use this tools to upload the metrics to our metrics platforms https://docs.pingcap.com/tidb/stable/clinic-introduction/

bufferflies avatar Oct 22 '25 07:10 bufferflies

What's your store label, location setting, and placement rules?

bufferflies avatar Oct 22 '25 07:10 bufferflies

I used location-labels = ["kubernetes.io/hostname"]. I have 15 worker nodes in the k8s cluster. This is the cluster yaml file I used.

---
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: basic
spec:
  version: v8.5.2
  timezone: "+9:00"
  pvReclaimPolicy: Delete
  enableDynamicConfiguration: true
  configUpdateStrategy: RollingUpdate
  discovery: {}
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
    maxFailoverCount: 3
    replicas: 3
    requests:
      storage: 10Gi
    config: |
      [replication]
        max-replicas = 3
        location-labels = ["kubernetes.io/hostname"]
  tikv:
    baseImage: pingcap/tikv
    maxFailoverCount: 10
    recoverFailover: true
    evictLeaderTimeout: 3m
    replicas: 30
    requests:
      storage: 2Ti
    config: |
      memory-usage-limit = "4GB"
      [log.file]
        filename = "/var/lib/tikv/tikv.log"
        max-days = 3
      [storage]
        reserve-space = "0MB"
        api-version = 2
        enable-ttl = true
        [storage.block-cache]
          capacity = "1GB"
  tidb:
    baseImage: pingcap/tidb
    maxFailoverCount: 2
    replicas: 2
    config: {}

conanoc avatar Oct 22 '25 07:10 conanoc

Why does the used_size + available_size != capacity in your cluster, such as store id 1014? One possible reason is that the other tikv shares one disk, or the disk has extra data that does not belong to the tikv.

bufferflies avatar Oct 23 '25 00:10 bufferflies

You are right. There are 15 k8s worker nodes and 30 tikv stores. So, there are two tikv stores in one worker node on average.

conanoc avatar Oct 23 '25 01:10 conanoc

You are right. There are 15 k8s worker nodes and 30 tikv stores. So, there are two tikv stores in one worker node on average.

In this case, you need to set the tikv's capacity manually.

bufferflies avatar Oct 23 '25 03:10 bufferflies

Isn't it better for the TiKV node to calculate its storage capacity based on the size of the PV instead of using the machine's disk size or adjusting it manually?

conanoc avatar Oct 23 '25 05:10 conanoc

Isn't it better for the TiKV node to calculate its storage capacity based on the size of the PV instead of using the machine's disk size or adjusting it manually?

TIKV will collect the PV details if your PV is normal. You can log in to the TikV pods and see the disk capacity. If the capaciity is not equal the PV instance, it indicates that your pv environment is wrong.

bufferflies avatar Oct 23 '25 06:10 bufferflies

Below is what I get from pvc. Please tell me more about how to check which is wrong.

% kubectl -n tidb-cluster describe pvc tikv-basic-tikv-0
Name:          tikv-basic-tikv-0
Namespace:     tidb-cluster
StorageClass:  local-path
Status:        Bound
Volume:        pvc-054d028d-8c59-4c2d-a8df-a02bf8897673
Labels:        app.kubernetes.io/component=tikv
               app.kubernetes.io/instance=basic
               app.kubernetes.io/managed-by=tidb-operator
               app.kubernetes.io/name=tidb-cluster
               tidb.pingcap.com/cluster-id=7561722164261267549
               tidb.pingcap.com/pod-name=basic-tikv-0
               tidb.pingcap.com/store-id=1014
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               tidb.pingcap.com/pod-name: basic-tikv-0
               volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
               volume.kubernetes.io/selected-node: ctikvnode007.kv
               volume.kubernetes.io/storage-provisioner: rancher.io/local-path
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      2Ti
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       basic-tikv-0
Events:        <none>

conanoc avatar Oct 23 '25 07:10 conanoc

In your pvc crd, the storage capacity is 2TI. But on the PD side, the TikV storage capacity is 20T. You can select one tikv pods to login, and see the actual size.

bufferflies avatar Oct 24 '25 01:10 bufferflies

You can select one tikv pods to login, and see the actual size.

I don't know how I can see the actual size of that tikv store. You mentioned that if the actual size is not equal to the capacity of the pv, then we can say "the pv environment is wrong". I would still wonder what is wrong. Is it a bug in rancher.io/local-path? Or is it a bug in tikv, pd, or tidb operator? Or is the cluster yaml file I wrote the problem?

conanoc avatar Oct 24 '25 05:10 conanoc

In my env, I can follow this steps to show the capacity of tikv data directory :

Image

bufferflies avatar Oct 25 '25 05:10 bufferflies

I see. In the case of local-path storage, several pods will share the disk space. So, the available disk space will not be the appropriate measure for the storage size of a tikv node. After I set tikv.limits.storage to 2Ti, it seems the cluster uses that value as the storage size of tikv nodes. What about setting the default value of tikv.limits.storage to tikv.requests.storage?

The region rebalancing progressed as expected after I changed the tikv.limits.storage value. In the following dashboard, I changed the value at around 15:20.

Image

conanoc avatar Oct 27 '25 02:10 conanoc

generally, we don't suggest the user to share one disk in prod env

bufferflies avatar Oct 27 '25 09:10 bufferflies

Is there any reason? Considering using local storage on k8s, sharing one disk would be inevitable.

conanoc avatar Oct 27 '25 09:10 conanoc