gh-ost
gh-ost copied to clipboard
Ensure replication running after replication restart
A Pull Request should be associated with an Issue.
We wish to have discussions in Issues. A single issue may be targeted by multiple PRs. If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues, and potentially we'll be able to point development in a particular direction.
Related issue:
Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR
Description
This PR adds checks to function restartReplication
that ensures that replication has started before continuing. Before adding this check, there as a hard coded 500ms wait time and then the program assumed that the replication threads were started and running (added in https://github.com/github/gh-ost/pull/337).
We encountered situations in different environments that this wait time wasn't sufficient. As an experiment, we doubled this wait time and deployed it to our live environments to see if this resolves the issue. This did help solve the problem, so now we are coming back to find a better permanent fix.
example output from our logs:
2024-06-11 08:17:41 FATAL Replication on <replica-hostname>:3306 is broken: Slave_IO_Running: Connecting, Slave_SQL_Running: Yes. Please make sure replication runs before using gh-ost.
old description
This PR increases the `startSlavePostWaitMilliseconds` as we are seeing an error when running `gh-ost` in some cloud environments that the `Slave_IO_Running` is `Connecting` rather than `Yes` as expected.We found this old PR that described the issue we're having https://github.com/github/gh-ost/pull/337 - as a first step we are increasing by doubling the value. If this test is successful, then we'll look into making this something that could be configured.
In case this PR introduced Go code changes:
- [ ] contributed code is using same conventions as original code
- [ ]
script/cibuild
returns with no formatting errors, build errors or unit test errors.