server
server copied to clipboard
MDEV-17516: Replication lag issue using parallel replication
MDEV-17516: Replication lag issue using parallel replication
Note the first commit is the regression, and the second is the code fix
Problem:
If parallelism is enabled on a replica, Seconds_Behind_Master can spike high in cases of delayed or infrequent transactions (also see MDEV-29639). This is because a parallel slave updates last_master_timestamp at the end of an event, rather than the beginning, to make for a less confusing value of Seconds_Behind_Master during times of high concurrency. However, when dealing with delayed or infrequent transactions, then Seconds_Behind_Master will use the last committed transaction on the slave in its calculation, leading to potentially very large values.
Solution:
Add additional logic to check if an event is the first transaction after the replica has been idle. If so, update the last_master_timestamp value when reading the event from the relay log.