electric icon indicating copy to clipboard operation
electric copied to clipboard

Send data out immediately on the write-path

Open balegas opened this issue 1 year ago • 7 comments

When a transactions comes in, the server persist changes to shape logs before sending them to subscribers waiting for new data. This is adding a fair amount of latency to the write path. In this issue we want to stream rows to clients as soon as we find a shape match and persist changes to shape logs in the background.

In order to do that, we're going to buffer row -> [shapeIds] and have a process to persist changes for those shape . We don't ack a transaction from Postgres until all shapes for a transaction have been updated. This allows to recover gracefully from crashes, as the server can continue from where it stopped and just skip shape logs that have alreaby been written.

The buffer has a fixed size (configurable?). Pending transactions are written to disk:

  • at fixed delay, say 20ms
  • when the buffer fills (ideally it would stop ingesting more transactions from logical repl)
  • When there are no pending transactions, but this feels more optimization and we can handle that later

It might happen that Postgres writes to Electric at a faster speed than Electric can handle shape logs. The developer would need to handle that situation by increasing the buffer size, or account for PG WAL size increase.

This task should be done after #1744 as it builds on the assumption that a single process would determine what shapes need to be written (we'd have to revise approach otherwise)

balegas avatar Sep 25 '24 16:09 balegas

Just an observation that a client may be able to re-connect after receiving a response within the buffer window. So we should serve new requests from memory where possible as well as currently blocked live requests.

thruflo avatar Sep 25 '24 16:09 thruflo

Two conflicting statements: "persist changes to shape logs in the background" vs "We don't ack a transaction from Postgres until all shapes for a transaction have been updated". In fact, you can ack a transaction as soon as it is persisted, and you can perform shape matching and propagation in a parallel background task that doesn't have to ack. In case of a crash, either the transaction was not persisted hence not ack'ed, and you get it trom the server; or it was and you get it from the persisted log. Again, it helps to have a single on-disk log common to all shapes.

marc-shapiro avatar Sep 25 '24 19:09 marc-shapiro

In case of a crash, either the transaction was not persisted hence not ack'ed, and you get it from the server

You can't get the operation from PG WAL once it's acked. We only ack on Pg to be able to recover from the point the server crashed.

Again, it helps to have a single on-disk log common to all shapes

yeah, we can come back to this. The reasoning is that we don't want to scan logs on reads because they are a lot more frequent than writes.

balegas avatar Sep 25 '24 22:09 balegas

Just an observation that a client may be able to re-connect after receiving a response within the buffer window. So we should serve new requests from memory where possible as well as currently blocked live requests.

Yeah, the issue description is not clear about that. In the original RFC I suggest doing that by scanning the common buffer, or holding a buffer for each shape. That needs to be clarified during implementation.

Well spotted. Thanks for raising it.

balegas avatar Sep 25 '24 22:09 balegas

This was completed? I thought we're still writing to disk before sending to client? Or is that just for the initial snapshot?

KyleAMathews avatar Nov 24 '24 23:11 KyleAMathews

@robacourt can you confirm?

balegas avatar Nov 25 '24 08:11 balegas

@robacourt how much time would this cut off latency

KyleAMathews avatar Dec 12 '24 02:12 KyleAMathews