kubeblocks icon indicating copy to clipboard operation
kubeblocks copied to clipboard

[BUG] Unable to horizontal scale down the TikV component

Open Shinnosukeys opened this issue 9 months ago • 3 comments

Describe the bug I have a surreal cluster based on tikv running, when horizontal scale down the TikV component , but the deleted tikv pod is still pending.

To Reproduce Image

Image

Expected behavior horizontal scale down the TikV successfully

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information): Kubernetes: v1.20.11 KubeBlocks: 0.9.3 kbcli: 0.9.3

Additional context Add any other context about the problem here.

Shinnosukeys avatar Mar 21 '25 07:03 Shinnosukeys

Changes on POD-i is pending, not POD is pending. Pleaes check TIKV POD logs, In particular for container kbgent.

shanshanying avatar Mar 24 '25 03:03 shanshanying

create cluster without serviceAccountName

2025-03-24T08:01:01.103Z	INFO	leave member at scaling-in error, retry later: requeue after: 1s as: [{"errorCode":"ERR_OPERATION_FAILED","message":"operation exec failed: clusters.apps.kubeblocks.io \"tidb-ntgayu\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"clusters\" in API group \"apps.kubeblocks.io\" in the namespace \"default\""}]	{"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"tidb-ntgayu-tikv","namespace":"default"}, "namespace": "default", "name": "tidb-ntgayu-tikv", "reconcileID": "b403d7b5-23c4-4eb1-bec3-11cdcac8f02c", "component": {"name":"tidb-ntgayu-tikv","namespace":"default"}}

create with serviceAccountName

2025-03-24T08:10:47.112Z        INFO        leave member at scaling-in error, retry later: requeue after: 1s as: [Post "http://172.16.0.35:3501/v1.0/leavemember": context deadline exceeded (Client.Timeout exceeded while awaiting headers)]        {"controller": "component", "controllerGroup": "[apps.kubeblocks.io](http://apps.kubeblocks.io/)", "controllerKind": "Component", "Component": {"name":"tidb-bcpere-tikv","namespace":"default"}, "namespace": "default", "name": "tidb-bcpere-tikv", "reconcileID": "bc0d4de2-83e3-4fac-93b4-fe1a9d79b506", "component": {"name":"tidb-bcpere-tikv","namespace":"default"}}

JashBook avatar Mar 24 '25 08:03 JashBook

There's a known issue that tikv sacle in would hang if tikv replica < 3. That's because pd's parameter max-replicas control the minimum replica of tikv. The default value is 3, so that tikv can't be scaled down to less than 3.

As a workaround, you can manually set the parameter using the Configuration CR. For example, if your cluster is named foo, you can find a configuration cr called foo-tidb-pd, then update it with:

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Configuration
...
spec:
  ...
  configItemDetails:
  - configFileParams:
      pd.toml:
        parameters:
          replication.max-replicas: "1"
    configSpec:
      name: pd-configuration
      ...
    name: pd-configuration

cjc7373 avatar Mar 31 '25 11:03 cjc7373

This issue has been marked as stale because it has been open for 30 days with no activity

github-actions[bot] avatar May 05 '25 00:05 github-actions[bot]