[BUG] Unable to horizontal scale down the TikV component
Describe the bug I have a surreal cluster based on tikv running, when horizontal scale down the TikV component , but the deleted tikv pod is still pending.
To Reproduce
Expected behavior horizontal scale down the TikV successfully
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information): Kubernetes: v1.20.11 KubeBlocks: 0.9.3 kbcli: 0.9.3
Additional context Add any other context about the problem here.
Changes on POD-i is pending, not POD is pending. Pleaes check TIKV POD logs, In particular for container kbgent.
create cluster without serviceAccountName
2025-03-24T08:01:01.103Z INFO leave member at scaling-in error, retry later: requeue after: 1s as: [{"errorCode":"ERR_OPERATION_FAILED","message":"operation exec failed: clusters.apps.kubeblocks.io \"tidb-ntgayu\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"clusters\" in API group \"apps.kubeblocks.io\" in the namespace \"default\""}] {"controller": "component", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Component", "Component": {"name":"tidb-ntgayu-tikv","namespace":"default"}, "namespace": "default", "name": "tidb-ntgayu-tikv", "reconcileID": "b403d7b5-23c4-4eb1-bec3-11cdcac8f02c", "component": {"name":"tidb-ntgayu-tikv","namespace":"default"}}
create with serviceAccountName
2025-03-24T08:10:47.112Z INFO leave member at scaling-in error, retry later: requeue after: 1s as: [Post "http://172.16.0.35:3501/v1.0/leavemember": context deadline exceeded (Client.Timeout exceeded while awaiting headers)] {"controller": "component", "controllerGroup": "[apps.kubeblocks.io](http://apps.kubeblocks.io/)", "controllerKind": "Component", "Component": {"name":"tidb-bcpere-tikv","namespace":"default"}, "namespace": "default", "name": "tidb-bcpere-tikv", "reconcileID": "bc0d4de2-83e3-4fac-93b4-fe1a9d79b506", "component": {"name":"tidb-bcpere-tikv","namespace":"default"}}
There's a known issue that tikv sacle in would hang if tikv replica < 3. That's because pd's parameter max-replicas control the minimum replica of tikv. The default value is 3, so that tikv can't be scaled down to less than 3.
As a workaround, you can manually set the parameter using the Configuration CR. For example, if your cluster is named foo, you can find a configuration cr called foo-tidb-pd, then update it with:
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Configuration
...
spec:
...
configItemDetails:
- configFileParams:
pd.toml:
parameters:
replication.max-replicas: "1"
configSpec:
name: pd-configuration
...
name: pd-configuration
This issue has been marked as stale because it has been open for 30 days with no activity