moco
moco copied to clipboard
When MTS is enabled, replication may not resume if a replica node crashes.
Describe the bug If MTS is enabled, replication may not resume when a replica node crashes. Specifically, the following error will occur.
{"level":"error","ts":1634545113.4992788,"logger":"***","msg":"error","error":"failed to configure replica instance 2: failed to change primary: Error 1802: CHANGE MASTER cannot be executed when the slave was stopped with an error or killed in MTS mode. Consider using RESET SLAVE or START SLAVE UNTIL.","stacktrace":"github.com/cybozu-go/moco/clustering.(*clusterManager).update.func1\n\t/work/clustering/manager.go:89"}
The documentation says the following: https://dev.mysql.com/doc/refman/8.0/en/start-replica.html
When using a multithreaded replica (replica_parallel_workers or slave_parallel_workers > 0), there is a chance of gaps in the sequence of transactions that have been executed from the relay log in the following cases:
- The coordinator thread was stopped
- An error occurs in the applier threads
- mysqld shuts down unexpectedly
If a gap occurs, an error will occur when CHANGE MASTER is executed.
Before MySQL 8.0.26, issuing START REPLICA on a multithreaded replica with gaps in the sequence of transactions executed from the relay log generates a warning. In such a situation, the solution is to use START REPLICA UNTIL SQL_AFTER_MTS_GAPS, then issue RESET REPLICA to remove any remaining relay logs.
Therefore, the following execution will fail https://github.com/cybozu-go/moco/blob/04d00bbee82abccc907e69bad763d8b303bc330e/pkg/dbop/replication.go#L14
Environment MySQL 8.0.18
To Reproduce
- create a table
- use a while true loop to INSERT the appropriate data
- enter replica pod and kill -6 $(pidof mysqld) command to kill mysqld
- Replica node's mysqld may not become ready.
In that case, Error 1802 is written in the log of moco-controller.
Expected behavior If a replica node crashes and a gap occurs, it will automatically close the gap and resume replication.
Thank you for the report. This seems a limitation for MySQL prior to 8.0.26.