dozer icon indicating copy to clipboard operation
dozer copied to clipboard

Add `replication_slot` configuration to `PostgresConfig`

Open chubei opened this issue 2 years ago • 8 comments

Currently we use a postgres replication slot with hardcoded name, and will try to create it if it doesn't already exist. We'd like the replication slot name to be configurable so database admins can create a replication slot and hand it to Dozer for use.

chubei avatar May 09 '23 07:05 chubei

How would we solve data consistency issue? We will not have correct snapshot data if replication slot is created outside snapshot read transaction.

karolisg avatar May 09 '23 07:05 karolisg

How would we solve data consistency issue?

I don't know how replication slot works. Do you mean the steps are:

  • Start a read transaction
  • Create a replication slot using the transaction
  • Read snapshot data using the slot
  • Listen to CDC events using the slot

These steps seem to imply: if Dozer is stopped before snapshotting is done, we have to create a different replication slot.

I've always believed: Dozer can reuse the same replication slot even it's stopped before snapshotting is done.

Which one is the actual case?

chubei avatar May 09 '23 07:05 chubei

How would we solve data consistency issue? We will not have correct snapshot data if replication slot is created outside snapshot read transaction.

That is a very good point.

One of the ways could be to provide a sample SQL script where we ask them to export into a file that could be used to load the initial snapshot.

Any other options?

v3g42 avatar May 09 '23 07:05 v3g42

How would we solve data consistency issue?

I don't know how replication slot works. Do you mean the steps are:

  • Start a read transaction
  • Create a replication slot using the transaction
  • Read snapshot data using the slot
  • Listen to CDC events using the slot

These steps seem to imply: if Dozer is stopped before snapshotting is done, we have to create a different replication slot.

I've always believed: Dozer can reuse the same replication slot even it's stopped before snapshotting is done.

Which one is the actual case?

If snapshotting is not done and dozer stops, we need to recreate replication slot and start snapshot read again.

karolisg avatar May 09 '23 08:05 karolisg

If snapshotting is not done and dozer stops, we need to recreate replication slot and start snapshot read again.

Thanks.

What happens when we create a replication slot but don't persist the snapshot data. The data is just lost?

Any other options?

We can also ask them to save the data in the same postgres instance and never modify the table again.

We can provide a sql script for doing that, and ask user to provide us with the replication slot name and table name.

chubei avatar May 09 '23 08:05 chubei

How would we solve data consistency issue? We will not have correct snapshot data if replication slot is created outside snapshot read transaction.

That is a very good point.

One of the ways could be to provide a sample SQL script where we ask them to export into a file that could be used to load the initial snapshot.

Any other options?

The best option is to do watermark-based snapshot reading (https://arxiv.org/pdf/2010.12597v1.pdf).

Even if user provides initial dataset, we cannot be sure that data is consistent.

karolisg avatar May 09 '23 08:05 karolisg

If snapshotting is not done and dozer stops, we need to recreate replication slot and start snapshot read again.

Thanks.

What happens when we create a replication slot but don't persist the snapshot data. The data is just lost?

Yes, data is lost.

Any other options?

We can also ask them to save the data in the same postgres instance and never modify the table again.

We can provide a sql script for doing that, and ask user to provide us with the replication slot name and table name.

Problem with this is that we try to read CDC from replication slot, but we cannot be sure that stream contains all messages. Messages can be missing if user used that slot and stopped dozer or if that slot is also used by some other applications.

Postgres stream has standby_status_update message, which is sent from dozer and contains the last read LSN. After this message is sent, the next time when we connect to stream we will not get messages before that LSN.

karolisg avatar May 09 '23 08:05 karolisg

We had a discutssion (@karolisg @mediuminvader @chubei). We believe:

  • Watermark-based snapshot reading cannot guarentee data consistency if replication slot is used by other processes during Dozer restart.
  • We should send standby_status_update with the LSN before the first record, or not sending at all if postgres can keep the replication slot alive.
  • We can let the user create replication slot and persist snapshot data, asking them to guarentee consistency.

chubei avatar May 09 '23 09:05 chubei