TiDB operator does not delete original configmap after user changes the config in the cr, causing resource leak
Bug Report
What version of Kubernetes are you using? Client Version: v1.31.0 Kustomize Version: v5.4.2 Server Version: v1.29.1
What version of TiDB Operator are you using? v1.6.0
What's the status of the TiDB cluster pods?
All pods are in Running state
What did you do?
We updated the spec.tikv.config field to a different non-empty value.
How to reproduce
- Deploy a TiDB cluster, for example:
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: test-cluster
spec:
configUpdateStrategy: RollingUpdate
enableDynamicConfiguration: true
helper:
image: alpine:3.16.0
pd:
baseImage: pingcap/pd
config: "[dashboard]\n internal-proxy = true\n"
maxFailoverCount: 0
mountClusterClientSecret: true
replicas: 3
requests:
storage: 10Gi
pvReclaimPolicy: Retain
ticdc:
baseImage: pingcap/ticdc
replicas: 3
tidb:
baseImage: pingcap/tidb
config: "[performance]\n tcp-keep-alive = true\ngraceful-wait-before-shutdown\
\ = 30\n"
maxFailoverCount: 0
replicas: 3
service:
externalTrafficPolicy: Local
type: NodePort
tiflash:
baseImage: pingcap/tiflash
replicas: 3
storageClaims:
- resources:
requests:
storage: 10Gi
tikv:
baseImage: pingcap/tikv
config: |
[raftdb]
max-open-files = 256
[rocksdb]
max-open-files = 256
maxFailoverCount: 0
mountClusterClientSecret: true
replicas: 3
requests:
storage: 100Gi
timezone: UTC
version: v8.1.0
- Change the
spec.tikv.configto another non-empty value, e.g.
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: test-cluster
spec:
configUpdateStrategy: RollingUpdate
enableDynamicConfiguration: true
helper:
image: alpine:3.16.0
pd:
baseImage: pingcap/pd
config: "[dashboard]\n internal-proxy = true\n"
maxFailoverCount: 0
mountClusterClientSecret: true
replicas: 3
requests:
storage: 10Gi
pvReclaimPolicy: Retain
ticdc:
baseImage: pingcap/ticdc
replicas: 3
tidb:
baseImage: pingcap/tidb
config: "[performance]\n tcp-keep-alive = true\ngraceful-wait-before-shutdown\
\ = 30\n"
maxFailoverCount: 0
replicas: 3
service:
externalTrafficPolicy: Local
type: NodePort
tiflash:
baseImage: pingcap/tiflash
replicas: 3
storageClaims:
- resources:
requests:
storage: 10Gi
tikv:
baseImage: pingcap/tikv
config: |
[raftdb]
max-open-files = 256
[rocksdb]
max-open-files = 128
maxFailoverCount: 0
mountClusterClientSecret: true
replicas: 3
requests:
storage: 100Gi
timezone: UTC
version: v8.1.0
What did you expect to see? We expected to see that the unused ConfigMaps are garbage collected by the TiDB operator. This prevents the operator from keeping generating new ConfigMaps and adding more objects into the etcd.
What did you see instead?
The operator created a new ConfigMap for TiKV but left the old ConfigMap undeleted. We observed the same behavior when updating spec.tiflash.config, which suggests that all TiDB components are likely affected by this issue.
Currently, we generate new ConfigMap for RollingUpdate ConfigUpdateStrategy. Only keep some recent ConfigMaps and delete other older ones may be better.