vitess icon indicating copy to clipboard operation
vitess copied to clipboard

Improve errant GTID detection in ERS to handle more cases.

Open GuptaManan100 opened this issue 4 months ago • 2 comments

Description

This PR adds the code changes for reworking the errant GTID detection in ERS. As proposed in https://github.com/vitessio/vitess/issues/16724#issuecomment-2385332901, we now also use the reparent journal length as an extra data point for GTID detection. All the different cases listed in #16274 have been added as unit tests in this PR, and the expectations of the algorithm have been verified.

Since, ReadReparentJournalInfo is a new RPC, there can be customers that upgrade Vitess multiple versions at a time (we are adding the new RPC in v21, but it is not available in releases prior to that). In this case, the vttablets won't have the RPC implemented. Since we don't want ERS to stop working in this situation, we have to keep the legacy errant GTID code around for this scenario. So, if reading the reparent journal information fails on any tablet, then we revert to using that legacy errant GTID detection code.

Related Issue(s)

  • Fixes #16724

Checklist

  • [x] "Backport to:" labels have been added if this change should be back-ported to release branches
  • [x] If this change is to be back-ported to previous releases, a justification is included in the PR description
  • [x] Tests were added or are not required
  • [x] Did the new or modified tests pass consistently locally and on CI?
  • [x] Documentation was added or is not required

Deployment Notes

GuptaManan100 avatar Oct 10 '24 15:10 GuptaManan100