moco icon indicating copy to clipboard operation
moco copied to clipboard

Improve scaling UX: even replicas, safe scale-down, sequential non write blocking scale-up

Open elderapo opened this issue 2 months ago • 4 comments

MOCO works well for initial cluster creation and steady-state operation. However, changing replication settings on an existing cluster—especially adjusting the replica count—has rough edges, as outlined below. The proposals that follow aim to significantly improve this experience.

1) Allow scaling replicas down

  • Support decreasing spec.replicas.
  • Optional: allow scale to 0; delete all Pods but retain the primary’s PVC so data persists and the cluster can be restarted later. - This can be achieved by scaling down the cluster to 1 instance/replica and then setting offline: true.

2) Allow even replica counts

  • Allow even values for spec.replicas.
  • Maintain safety with a true majority for semi-sync: ceil((replicas - 1) / 2) required ACKs.
  • Rationale: does not increase fault tolerance vs the previous odd size, but enables controlled scaling under resource constraints or over time.

3) Prevent write stall during scale-up

When scaling up (e.g., 1 → 3), the controller immediately sets rpl_semi_sync_master_enabled=ON and rpl_semi_sync_master_wait_for_slave_count=floor(replicas / 2). While new replicas are being provisioned (PVC provisioning, initial clone & replication), writes do not get processed until rpl_semi_sync_master_wait_for_slave_count is satisfied. Ideally, during transitions, these replication settings should be enabled/set only when replicas are ready to satisfy them (ACK commits).

4) Sequential replica adds on scale-up

When scaling up, add replicas sequentially rather than all at once. Create a single new secondary, wait until it is ready, then create the next. The MySQL Clone plugin doesn’t support concurrent clone provisioning from a single donor anyway.

5) Rename spec.replicasspec.instances

Rename spec.replicas to spec.instances to better reflect that the field represents the total number of MySQL instances (one primary plus N secondaries), not just replicas. This change clarifies intent and avoids confusion.


I am happy to hear your thoughts. If there’s a green light from the maintainers, I’m willing to implement items 1, 2, 4, and 5; item 3 is probably best handled by someone more familiar with MOCO internals.

elderapo avatar Oct 16 '25 00:10 elderapo

Actually, I've managed to implement: 1, 2, 3 in https://github.com/cybozu-go/moco/pull/844. 4 (ParallelPodManagement -> OrderedReadyPodManagement) breaks lots of moco logic, so giving up on that one, 5 is better handled ina separate PR.

elderapo avatar Oct 16 '25 18:10 elderapo

@elderapo Thanks for the report! I’d like to think carefully about the behavior before replying, so it may take a little while.

shunki-fujita avatar Oct 17 '25 06:10 shunki-fujita

This can be achieved by scaling down the cluster to 1 instance/replica and then setting offline: true.

It’s possible to set offline: true without reducing the number of replicas to 1.

shunki-fujita avatar Oct 17 '25 07:10 shunki-fujita

We discussed these points as a team and would like to share our thoughts:

1) Allow scaling replicas down

MOCO’s cluster design fundamentally relies on the assumption that at least one replica in the cluster always holds the latest committed transactions. This is ensured through GTID-based semi-synchronous replication, where the primary only finalizes a commit once an acknowledgment is received from at least one up-to-date replica. Because of this, the cluster’s fault-tolerance and consistency guarantees depend on those replicas continuously maintaining the latest state.

Simply reducing the number of replicas without verifying their replication status breaks that assumption. If the replicas being removed happen to contain transactions that have not yet been fully propagated or acknowledged by the remaining nodes, the cluster can lose its only copy of the most recent transactions. After a primary failure, such missing GTIDs would result in irrecoverable data loss or inconsistent failover.

From a data-safety standpoint, it is therefore difficult to accept a scale-down feature that just reduces spec.replicas.

2) Allow even replica counts

Allowing an even number of replicas is theoretically possible, but it doesn’t bring any real benefit to MOCO. Since MOCO’s HA design relies on clear majority decisions and semi-synchronous replication, even replica counts don’t improve availability or safety — they only make quorum logic and configuration more complex.

3) Prevent write stall during scale-up

We think this would be a useful improvement! During scale-up, new replicas cannot ACK until they finish initial clone and catch up, which may cause write stalls on the primary. Temporarily relaxing semi-sync requirements until all replicas are ready would prevent unnecessary commit blocking while keeping reasonable safety.

Thanks for raising these points and for your detailed consideration.

shunki-fujita avatar Oct 22 '25 10:10 shunki-fujita