tidb-operator Proposal: Parallelize control-loops of same TiDB cluster

Feature Request

Is your feature request related to a problem? Please describe: We've already parallelize our control-loops between different TiDB clusters, the workQueue ensures that different workers won't sync a same TidbCluster at the same time. However, for a specific TiDB cluster, all reconcile functions run in sequential, which increases risk of subsequent operations being blocked by failure or lag of one step. For example, it may takes a relatively long time to rolling-update TiKV, and the operator cannot perform failover for tidb-servers at that time because of the synchronization.

A real case could be found here: https://github.com/pingcap/tidb-operator/pull/1242#discussion_r351140739)

Describe the feature you'd like:

Try to break tidb_cluster_controller into several sub-controllers that runs in parallel.

Teachability, Documentation, Adoption, Migration Strategy:

This is an notable change and it is safer to target it in v1.2.0

Nov 27 '19 10:11 aylei

Currently we reconcile all the components in a sequential way. Parallelizing control-loops is a nice choice.

Just one idea:

We could dividing current loops into serveral parallelize loops by type of component. Pd / TiKV / TiDB could have their own loops and reconciled in parallel. And the service , statefulset and status of them would still reconcile in their own loops sequentially.

Nov 27 '19 10:11 Yisaer

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days

Feb 24 '20 00:02 github-actions[bot]

At least upgrading or scaling of one component should not block the scaling of the other components.

Jul 16 '21 02:07 DanielZhangQD