server icon indicating copy to clipboard operation
server copied to clipboard

MDEV-17516: Replication lag issue using parallel replication

Open bnestere opened this issue 3 years ago • 1 comments
trafficstars

MDEV-17516: Replication lag issue using parallel replication

Note the first commit is the regression, and the second is the code fix

Problem:

If parallelism is enabled on a replica, Seconds_Behind_Master can spike high in cases of delayed or infrequent transactions (also see MDEV-29639). This is because a parallel slave updates last_master_timestamp at the end of an event, rather than the beginning, to make for a less confusing value of Seconds_Behind_Master during times of high concurrency. However, when dealing with delayed or infrequent transactions, then Seconds_Behind_Master will use the last committed transaction on the slave in its calculation, leading to potentially very large values.

Solution:

Add additional logic to check if an event is the first transaction after the replica has been idle. If so, update the last_master_timestamp value when reading the event from the relay log.

bnestere avatar Oct 21 '22 02:10 bnestere