dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

feat: add master lsn and journal_executed dcheck in replica via ping

Open kostasrim opened this issue 1 year ago • 5 comments

resolves #2773

  • add lsn number to journal ping
  • add periodic ping from master to replica in journal
  • add dcheck in replica that journal_executed == lsn
  • add version 4 in dfly version
  • add separate counter for journal_executed that has proper semantics for pinging lsn

kostasrim avatar Mar 26 '24 21:03 kostasrim

@adiholden This is just a prototype so don't review -- it needs polishing and some fixing/gluing. I opened this because I have a small question:

There are two options:

  1. Ping at period P on a separate fiber.
  2. Ping at period P when we Record an entry in the journal (that means that if the master is idle there will be no pings, even if Period internal was reached).

I opted in for 2 (although it's easy to switch to 1). The reason is that if master is idle, then the last recorded entry (or one of the last within 2 seconds) will send PING LSN journal entry and since master won't progress anyway the last lag will show how close the replica is. The downside of this is that we won't get continuous updates on our progression on the replica side. Also, note for (1) we will need n=number_of_flows fibers whereas with (2) we don't need (it flows naturally over the flow of stable sync and journal recording).

Do you have any objections with (2)?

kostasrim avatar Mar 26 '24 21:03 kostasrim

@adiholden This is just a prototype so don't review -- it needs polishing and some fixing/gluing. I opened this because I have a small question:

There are two options:

  1. Ping at period P on a separate fiber.
  2. Ping at period P when we Record an entry in the journal (that means that if the master is idle there will be no pings, even if Period internal was reached).

I opted in for 2 (although it's easy to switch to 1). The reason is that if master is idle, then the last recorded entry (or one of the last within 2 seconds) will send PING LSN journal entry and since master won't progress anyway the last lag will show how close the replica is. The downside of this is that we won't get continuous updates on our progression on the replica side. Also, note for (1) we will need n=number_of_flows fibers whereas with (2) we don't need (it flows naturally over the flow of stable sync and journal recording).

Do you have any objections with (2)?

Option 2 sounds good

adiholden avatar Mar 27 '24 08:03 adiholden

@adiholden replication tests should fail -- I am chasing two missing LSN's :stuck_out_tongue: In the meantime, you can leave comments :)

kostasrim avatar Mar 27 '24 12:03 kostasrim

@kostasrim should I review?

romange avatar Mar 28 '24 17:03 romange

@romange yes I think your comments are addressed. Let me know :)

kostasrim avatar Mar 28 '24 19:03 kostasrim