tidb-operator
tidb-operator copied to clipboard
Empty objects under `storageClaims` bypass TiDB’s validation
Bug Report
What version of Kubernetes are you using?
Client Version: [version.Info](http://version.info/){Major:“1”, Minor:“17", GitVersion:“v1.17.1”, GitCommit:“d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:“clean”, BuildDate:“2020-01-14T21:04:32Z”, GoVersion:“go1.13.5”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: [version.Info](http://version.info/){Major:“1", Minor:“22”, GitVersion:“v1.22.9", GitCommit:“6df4433e288edc9c40c2e344eb336f63fad45cd2”, GitTreeState:“clean”, BuildDate:“2022-05-19T19:53:08Z”, GoVersion:“go1.16.15", Compiler:“gc”, Platform:“linux/amd64"}
What version of TiDB Operator are you using?
TiDB Operator Version: [version.Info](http://version.info/){GitVersion:“v1.3.0-45+1470cfb46e1ffb-dirty”, GitCommit:“1470cfb46e1ffb8bb86f74ba455865a95b825413", GitTreeState:“dirty”, BuildDate:“2022-07-06T18:55:00Z”, GoVersion:“go1.18.3”, Compiler:“gc”, Platform:“linux/amd64”}
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
standard (default) [rancher.io/local-path](http://rancher.io/local-path) Delete WaitForFirstConsumer false 40m
$ kubectl get pvc -n {tidb-cluster-namespace}
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pd-advanced-tidb-pd-0 Bound pvc-b566858b-bf4e-4e33-b31e-5d7feb7397b1 10Gi RWO standard 40m
pd-advanced-tidb-pd-1 Bound pvc-df70980f-12cf-499f-8ad7-e41cac98c5d0 10Gi RWO standard 40m
pd-advanced-tidb-pd-2 Bound pvc-d41691d8-feb5-4e21-b282-1ece1851cffa 10Gi RWO standard 40m
tikv-advanced-tidb-tikv-0 Bound pvc-42e652d6-2400-4ae8-b790-cef8466e4566 100Gi RWO standard 39m
tikv-advanced-tidb-tikv-1 Bound pvc-5af08c43-e02d-433c-896a-b85ad568d1ca 100Gi RWO standard 39m
tikv-advanced-tidb-tikv-2 Bound pvc-652761b6-9fff-4080-b13d-9e364062cddc 100Gi RWO standard 39m
What’s the status of the TiDB cluster pods?
$ kubectl get po -n {tidb-cluster-namespace} -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
advanced-tidb-discovery-6998694d4c-fmpjl 1/1 Running 0 41m 10.244.2.2 test-worker3 <none> <none>
advanced-tidb-pd-0 1/1 Running 0 41m 10.244.3.4 test-worker <none> <none>
advanced-tidb-pd-1 1/1 Running 0 41m 10.244.2.4 test-worker3 <none> <none>
advanced-tidb-pd-2 1/1 Running 0 41m 10.244.1.4 test-worker2 <none> <none>
advanced-tidb-tidb-0 2/2 Running 0 39m 10.244.3.7 test-worker <none> <none>
advanced-tidb-tidb-1 2/2 Running 0 39m 10.244.2.7 test-worker3 <none> <none>
advanced-tidb-tidb-2 2/2 Running 0 39m 10.244.1.7 test-worker2 <none> <none>
advanced-tidb-tikv-0 1/1 Running 0 40m 10.244.2.6 test-worker3 <none> <none>
advanced-tidb-tikv-1 1/1 Running 0 40m 10.244.3.6 test-worker <none> <none>
advanced-tidb-tikv-2 1/1 Running 0 40m 10.244.1.6 test-worker2 <none> <none>
tidb-controller-manager-6cc68f8949-52vwt 1/1 Running 0 41m 10.244.3.2 test-worker <none> <none>
tidb-scheduler-dd569b6b4-hj294 2/2 Running 0 41m 10.244.1.2 test-worker2 <none> <none>
What did you do?
We created a CR with TiFlash as follows and we deployed it using kubectl apply -f.
The CR file
apiVersion: [pingcap.com/v1alpha1](http://pingcap.com/v1alpha1)
kind: TidbCluster
metadata:
name: advanced-tidb
spec:
version: “v5.4.0"
timezone: UTC
configUpdateStrategy: RollingUpdate
helper:
image: busybox:1.34.1
pvReclaimPolicy: Retain
enableDynamicConfiguration: true
tiflash:
baseImage: pingcap/tiflash
replicas: 2
storageClaims:
- {}
- {}
- {}
pd:
baseImage: pingcap/pd
config: |
[dashboard]
internal-proxy = true
replicas: 3
maxFailoverCount: 0
requests:
storage: 10Gi
mountClusterClientSecret: true
tidb:
baseImage: pingcap/tidb
config: |
[performance]
tcp-keep-alive = true
replicas: 3
maxFailoverCount: 0
service:
type: NodePort
externalTrafficPolicy: Local
tikv:
baseImage: pingcap/tikv
config: |
log-level = “info”
replicas: 3
maxFailoverCount: 0
requests:
storage: 100Gi
mountClusterClientSecret: true
Note that we assigned empty values to storageClaims.
What did you expect to see?
We expected to see that the input would be rejected as storageClaims is essentially empty. Additionally, we believed that we should see an error message “storageClaims should be configured at least one item.” as this line suggests.
What did you see instead? The pod related to TiFlash was never created but we did not see the error message that we mentioned above. In fact, we did not see any error messages from the operator indicating why the pod for TiFlash failed to be created.
Possible reason
We believed that the validation here is not comprehensive since even if we build storageClaims with empty values, it will bypass this if statement as the length of spec.StorageClaims is 3 in our case.
Besides, from kubectl we found out that spec.storageClaims[INDEX].resources.requests.storage is a required field. But it is labelled as optional in types.go at this line, which should be incorrect.
Do you mean operator need to validate storageClaims configuration? If it is not currently configured, it will use default storage.
Hi, @mikechengwei Thanks for your reply. Yes, you are absolutely correct, the operator validates the storageClaims by checking the length of the slice, if the length is smaller than 1, it will use the default storage. But the problem is triggered when there are several items under storageClaims, but the items are empty. For example:
storageClaims:
- {}
- {}
- {}
In this case, it bypasses the validation and causes pods to be down.