GracefulMasterTakeover does not set master back to writable state in case of an error
The Problem
One of the steps of GracefulMasterTakeover is making the master instance read-only: https://github.com/percona/orchestrator/blob/181f94a02601e71e66c50e54f26c8b52cb5c03bd/go/logic/topology_recovery.go#L2170-L2173
If the process fails right after that, for example here, the replicaset will be intact, though master will remain in read-only status.
The Proposed Solution
-
Ensure the following code or similar one is executed before any
return nil, nil, errand after the master is set to read-only: https://github.com/percona/orchestrator/blob/181f94a02601e71e66c50e54f26c8b52cb5c03bd/go/logic/topology_recovery.go#L2192-L2197 -
Add
PostUnsuccessfulGracefulTakeoverProcessesconfig entry and execute it if graceful takeover was not successful, similar to other takeover/failover processes. This will allow users to add their own hooks to check the master status and update it if needed.
Could you please suggest which of the two solutions (or both) is better to implement, or propose the other way to work around the issue?