Proposal: Parallelize control-loops of same TiDB cluster
Feature Request
Is your feature request related to a problem? Please describe:
We've already parallelize our control-loops between different TiDB clusters, the workQueue ensures that different workers won't sync a same TidbCluster at the same time. However, for a specific TiDB cluster, all reconcile functions run in sequential, which increases risk of subsequent operations being blocked by failure or lag of one step. For example, it may takes a relatively long time to rolling-update TiKV, and the operator cannot perform failover for tidb-servers at that time because of the synchronization.
- A real case could be found here: https://github.com/pingcap/tidb-operator/pull/1242#discussion_r351140739)
Describe the feature you'd like:
Try to break tidb_cluster_controller into several sub-controllers that runs in parallel.
Teachability, Documentation, Adoption, Migration Strategy:
This is an notable change and it is safer to target it in v1.2.0
Currently we reconcile all the components in a sequential way. Parallelizing control-loops is a nice choice.
Just one idea:
We could dividing current loops into serveral parallelize loops by type of component. Pd / TiKV / TiDB could have their own loops and reconciled in parallel. And the service , statefulset and status of them would still reconcile in their own loops sequentially.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days
At least upgrading or scaling of one component should not block the scaling of the other components.