yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[DocDB] GetLoadMoveCompletion can return inaccurate results in some cases after a master failover

Open iSignal opened this issue 1 year ago • 0 comments

Jira Link: DB-8400

Description

After a master failover, the new master has a table -> tablet map and a tablet -> tserver map in the sys catalog (The latter may be inaccurate because heartbeat processing from tservers may be delayed on the old master).

When a tserver is blacklisted and GetLoadMoveCompletion is called, the master needs to be confident that it is aware of all tablets assigned to the tserver. Given the tablet -> tserver persisted map can be inaccurate, the master waits for an extra 2 mins (controlled by flag blacklist_progress_initial_delay_secs) before responding 100% to a GetLoadMoveCompletion request.

However, it is possible, though rare, that there are quorum changes during the failover that the new master leader hasn't yet heard about. This can happen if a majority of the tserver tablet quorum hasn't yet heartbeated to the new master leader in the 2 min window.

To fix this gap, one possible solution is that GetLoadMoveCompletion doesn't use just the 2 min window and it instead waits to hear heartbeats from a majority of the quorum for all tablets it knows about before replying 100% to this RPC.

@lingamsandeep @rahuldesirazu @SrivastavaAnubhav @bmatican @charleswang234 @hari90

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • [X] I confirm this issue does not contain any sensitive information.

iSignal avatar Oct 21 '23 23:10 iSignal