yugabyte-db
yugabyte-db copied to clipboard
[DocDB] GetLoadMoveCompletion can return inaccurate results in some cases after a master failover
Jira Link: DB-8400
Description
After a master failover, the new master has a table -> tablet map and a tablet -> tserver map in the sys catalog (The latter may be inaccurate because heartbeat processing from tservers may be delayed on the old master).
When a tserver is blacklisted and GetLoadMoveCompletion is called, the master needs to be confident that it is aware of all tablets assigned to the tserver. Given the tablet -> tserver persisted map can be inaccurate, the master waits for an extra 2 mins (controlled by flag blacklist_progress_initial_delay_secs) before responding 100% to a GetLoadMoveCompletion request.
However, it is possible, though rare, that there are quorum changes during the failover that the new master leader hasn't yet heard about. This can happen if a majority of the tserver tablet quorum hasn't yet heartbeated to the new master leader in the 2 min window.
To fix this gap, one possible solution is that GetLoadMoveCompletion doesn't use just the 2 min window and it instead waits to hear heartbeats from a majority of the quorum for all tablets it knows about before replying 100% to this RPC.
@lingamsandeep @rahuldesirazu @SrivastavaAnubhav @bmatican @charleswang234 @hari90
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
- [X] I confirm this issue does not contain any sensitive information.