tidb-operator icon indicating copy to clipboard operation
tidb-operator copied to clipboard

Scaling in TiKV blocked by `not found in cluster` error

Open DanielZhangQD opened this issue 3 years ago • 1 comments

Bug Report

What version of Kubernetes are you using?

What version of TiDB Operator are you using?

v1.3.2 What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

What's the status of the TiDB cluster pods?

# k -n tidb60441 get pod
NAME                            READY   STATUS      RESTARTS   AGE
db-discovery-7b5b664f88-wkhgr   1/1     Running     0          3h48m
db-monitor-0                    3/3     Running     0          3h48m
db-pd-0                         1/1     Running     0          3h48m
db-tidb-0                       3/3     Running     0          107m
db-tidb-initializer-b29tn       0/1     Completed   0          3h48m
db-tikv-3                       1/1     Running     0          57m
db-tikv-4                       1/1     Running     0          57m
db-tikv-5                       1/1     Running     0          57m
db-tikv-6                       1/1     Running     0          34m
db-tikv-7                       1/1     Running     0          34m
db-tikv-8                       1/1     Running     0          34m

What did you do?

Scale out 3 new Pods and then scale in 3 Pods very soon What did you expect to see? The Pods are scaled in What did you see instead? The latest Pod tikv-8 cannot scale in

DanielZhangQD avatar Jul 25 '22 07:07 DanielZhangQD

Logs of tidb-controller-manager:

W0725 06:10:44.862481       1 pvc_resizer.go:316] PVC tidb60441/tikv-db-tikv-8 is not bound
E0725 06:10:44.862527       1 pvc_resizer.go:468] Check PVC "tidb60441/tikv-db-tikv-8" of "tidb60441/db:tikv" resized failed: storage capacity is empty
W0725 06:11:02.165067       1 pvc_resizer.go:316] PVC tidb60441/tikv-db-tikv-8 is not bound
E0725 06:11:02.165117       1 pvc_resizer.go:468] Check PVC "tidb60441/tikv-db-tikv-8" of "tidb60441/db:tikv" resized failed: storage capacity is empty
W0725 06:11:05.765212       1 pvc_resizer.go:316] PVC tidb60441/tikv-db-tikv-8 is not bound
E0725 06:11:05.765262       1 pvc_resizer.go:468] Check PVC "tidb60441/tikv-db-tikv-8" of "tidb60441/db:tikv" resized failed: storage capacity is empty
W0725 06:11:24.521613       1 pvc_resizer.go:316] PVC tidb60441/tikv-db-tikv-8 is not bound
E0725 06:11:24.521652       1 pvc_resizer.go:468] Check PVC "tidb60441/tikv-db-tikv-8" of "tidb60441/db:tikv" resized failed: storage capacity is empty
W0725 06:11:24.801641       1 pvc_resizer.go:316] PVC tidb60441/tikv-db-tikv-8 is not bound
E0725 06:11:24.801673       1 pvc_resizer.go:468] Check PVC "tidb60441/tikv-db-tikv-8" of "tidb60441/db:tikv" resized failed: storage capacity is empty
I0725 06:11:47.994596       1 tikv_member_manager.go:950] pod: [tidb60441/db-tikv-8] set labels: map[failure-domain.beta.kubernetes.io/region:us-east-1 failure-domain.beta.kubernetes.io/zone:us-east-1a kubernetes.io/hostname:ip-10-250-8-133.ec2.internal] successfully
I0725 06:11:48.066835       1 tikv_scaler.go:124] tikvScaler.ScaleIn: delete store 156 for tikv tidb60441/db-tikv-8 successfully
I0725 06:11:48.089545       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Down, requeuing
I0725 06:11:48.396367       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Offline, requeuing
I0725 06:11:48.690123       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Offline, requeuing
I0725 06:11:49.134156       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Offline, requeuing
I0725 06:11:49.939155       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Offline, requeuing
I0725 06:11:51.893049       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Offline, requeuing
I0725 06:11:52.150943       1 tidb_cluster_controller.go:124] TidbCluster: tidb60441/db, still need sync: TiKV tidb60441/db-tikv-8 store 156 is still in cluster, state: Offline, requeuing
W0725 06:11:57.346193       1 tikv_scaler.go:160] TiKV tidb60441/db-tikv-8 store 156 in status is not equal with store  in label
E0725 06:11:57.374374       1 tidb_cluster_controller.go:126] TidbCluster: tidb60441/db, sync failed TiKV tidb60441/db-tikv-8 not found in cluster, requeuing
W0725 06:11:57.660228       1 tikv_scaler.go:160] TiKV tidb60441/db-tikv-8 store 156 in status is not equal with store  in label
E0725 06:11:57.660403       1 tidb_cluster_controller.go:126] TidbCluster: tidb60441/db, sync failed TiKV tidb60441/db-tikv-8 not found in cluster, requeuing
W0725 06:12:01.222425       1 tikv_scaler.go:160] TiKV tidb60441/db-tikv-8 store 156 in status is not equal with store  in label

Because there is not store id label in Pod tikv-8:

db-tikv-7                       1/1     Running     0          25m     app.kubernetes.io/component=tikv,app.kubernetes.io/instance=db,app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/name=tidb-cluster,controller-revision-hash=db-tikv-56d5b59dd4,statefulset.kubernetes.io/pod-name=db-tikv-7,tidb.pingcap.com/cluster-id=7124140027519895143,tidb.pingcap.com/store-id=154
db-tikv-8                       1/1     Running     0          25m     app.kubernetes.io/component=tikv,app.kubernetes.io/instance=db,app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/name=tidb-cluster,controller-revision-hash=db-tikv-56d5b59dd4,statefulset.kubernetes.io/pod-name=db-tikv-8,tidb.pingcap.com/cluster-id=7124140027519895143

Which is blocked by the logic here https://github.com/pingcap/tidb-operator/blob/master/pkg/manager/member/tikv_scaler.go#L160 and https://github.com/pingcap/tidb-operator/blob/master/pkg/manager/member/tikv_scaler.go#L203.

DanielZhangQD avatar Jul 25 '22 07:07 DanielZhangQD