feat: add master lsn and journal_executed dcheck in replica via ping
resolves #2773
- add lsn number to journal ping
- add periodic ping from master to replica in journal
- add dcheck in replica that journal_executed == lsn
- add version 4 in dfly version
- add separate counter for journal_executed that has proper semantics for pinging lsn
@adiholden This is just a prototype so don't review -- it needs polishing and some fixing/gluing. I opened this because I have a small question:
There are two options:
- Ping at period
Pon a separate fiber. - Ping at period
Pwhen we Record an entry in the journal (that means that if the master is idle there will be no pings, even if Period internal was reached).
I opted in for 2 (although it's easy to switch to 1). The reason is that if master is idle, then the last recorded entry (or one of the last within 2 seconds) will send PING LSN journal entry and since master won't progress anyway the last lag will show how close the replica is. The downside of this is that we won't get continuous updates on our progression on the replica side. Also, note for (1) we will need n=number_of_flows fibers whereas with (2) we don't need (it flows naturally over the flow of stable sync and journal recording).
Do you have any objections with (2)?
@adiholden This is just a prototype so don't review -- it needs polishing and some fixing/gluing. I opened this because I have a small question:
There are two options:
- Ping at period
Pon a separate fiber.- Ping at period
Pwhen we Record an entry in the journal (that means that if the master is idle there will be no pings, even if Period internal was reached).I opted in for 2 (although it's easy to switch to 1). The reason is that if master is idle, then the last recorded entry (or one of the last within 2 seconds) will send
PING LSNjournal entry and since master won't progress anyway the last lag will show how close the replica is. The downside of this is that we won't get continuous updates on our progression on the replica side. Also, note for (1) we will needn=number_of_flowsfibers whereas with (2) we don't need (it flows naturally over the flow of stable sync and journal recording).Do you have any objections with (2)?
Option 2 sounds good
@adiholden replication tests should fail -- I am chasing two missing LSN's :stuck_out_tongue: In the meantime, you can leave comments :)
@kostasrim should I review?
@romange yes I think your comments are addressed. Let me know :)