vitess
vitess copied to clipboard
Improve errant GTID detection in ERS to handle more cases.
Description
This PR adds the code changes for reworking the errant GTID detection in ERS. As proposed in https://github.com/vitessio/vitess/issues/16724#issuecomment-2385332901, we now also use the reparent journal length as an extra data point for GTID detection. All the different cases listed in #16274 have been added as unit tests in this PR, and the expectations of the algorithm have been verified.
Since, ReadReparentJournalInfo
is a new RPC, there can be customers that upgrade Vitess multiple versions at a time (we are adding the new RPC in v21, but it is not available in releases prior to that). In this case, the vttablets won't have the RPC implemented. Since we don't want ERS to stop working in this situation, we have to keep the legacy errant GTID code around for this scenario. So, if reading the reparent journal information fails on any tablet, then we revert to using that legacy errant GTID detection code.
Related Issue(s)
- Fixes #16724
Checklist
- [x] "Backport to:" labels have been added if this change should be back-ported to release branches
- [x] If this change is to be back-ported to previous releases, a justification is included in the PR description
- [x] Tests were added or are not required
- [x] Did the new or modified tests pass consistently locally and on CI?
- [x] Documentation was added or is not required