tidb-operator icon indicating copy to clipboard operation
tidb-operator copied to clipboard

Proposal: Parallelize control-loops of same TiDB cluster

Open aylei opened this issue 6 years ago • 3 comments

Feature Request

Is your feature request related to a problem? Please describe: We've already parallelize our control-loops between different TiDB clusters, the workQueue ensures that different workers won't sync a same TidbCluster at the same time. However, for a specific TiDB cluster, all reconcile functions run in sequential, which increases risk of subsequent operations being blocked by failure or lag of one step. For example, it may takes a relatively long time to rolling-update TiKV, and the operator cannot perform failover for tidb-servers at that time because of the synchronization.

  • A real case could be found here: https://github.com/pingcap/tidb-operator/pull/1242#discussion_r351140739)

Describe the feature you'd like:

Try to break tidb_cluster_controller into several sub-controllers that runs in parallel.

Teachability, Documentation, Adoption, Migration Strategy:

This is an notable change and it is safer to target it in v1.2.0

aylei avatar Nov 27 '19 10:11 aylei

Currently we reconcile all the components in a sequential way. Parallelizing control-loops is a nice choice.

Just one idea:

We could dividing current loops into serveral parallelize loops by type of component. Pd / TiKV / TiDB could have their own loops and reconciled in parallel. And the service , statefulset and status of them would still reconcile in their own loops sequentially.

Yisaer avatar Nov 27 '19 10:11 Yisaer

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days

github-actions[bot] avatar Feb 24 '20 00:02 github-actions[bot]

At least upgrading or scaling of one component should not block the scaling of the other components.

DanielZhangQD avatar Jul 16 '21 02:07 DanielZhangQD