dozer
dozer copied to clipboard
Add `replication_slot` configuration to `PostgresConfig`
Currently we use a postgres replication slot with hardcoded name, and will try to create it if it doesn't already exist. We'd like the replication slot name to be configurable so database admins can create a replication slot and hand it to Dozer for use.
How would we solve data consistency issue? We will not have correct snapshot data if replication slot is created outside snapshot read transaction.
How would we solve data consistency issue?
I don't know how replication slot works. Do you mean the steps are:
- Start a read transaction
- Create a replication slot using the transaction
- Read snapshot data using the slot
- Listen to CDC events using the slot
These steps seem to imply: if Dozer is stopped before snapshotting is done, we have to create a different replication slot.
I've always believed: Dozer can reuse the same replication slot even it's stopped before snapshotting is done.
Which one is the actual case?
How would we solve data consistency issue? We will not have correct snapshot data if replication slot is created outside snapshot read transaction.
That is a very good point.
One of the ways could be to provide a sample SQL script where we ask them to export into a file that could be used to load the initial snapshot.
Any other options?
How would we solve data consistency issue?
I don't know how replication slot works. Do you mean the steps are:
- Start a read transaction
- Create a replication slot using the transaction
- Read snapshot data using the slot
- Listen to CDC events using the slot
These steps seem to imply: if Dozer is stopped before snapshotting is done, we have to create a different replication slot.
I've always believed: Dozer can reuse the same replication slot even it's stopped before snapshotting is done.
Which one is the actual case?
If snapshotting is not done and dozer stops, we need to recreate replication slot and start snapshot read again.
If snapshotting is not done and dozer stops, we need to recreate replication slot and start snapshot read again.
Thanks.
What happens when we create a replication slot but don't persist the snapshot data. The data is just lost?
Any other options?
We can also ask them to save the data in the same postgres instance and never modify the table again.
We can provide a sql script for doing that, and ask user to provide us with the replication slot name and table name.
How would we solve data consistency issue? We will not have correct snapshot data if replication slot is created outside snapshot read transaction.
That is a very good point.
One of the ways could be to provide a sample SQL script where we ask them to export into a file that could be used to load the initial snapshot.
Any other options?
The best option is to do watermark-based snapshot reading (https://arxiv.org/pdf/2010.12597v1.pdf).
Even if user provides initial dataset, we cannot be sure that data is consistent.
If snapshotting is not done and dozer stops, we need to recreate replication slot and start snapshot read again.
Thanks.
What happens when we create a replication slot but don't persist the snapshot data. The data is just lost?
Yes, data is lost.
Any other options?
We can also ask them to save the data in the same postgres instance and never modify the table again.
We can provide a sql script for doing that, and ask user to provide us with the replication slot name and table name.
Problem with this is that we try to read CDC from replication slot, but we cannot be sure that stream contains all messages. Messages can be missing if user used that slot and stopped dozer or if that slot is also used by some other applications.
Postgres stream has standby_status_update message, which is sent from dozer and contains the last read LSN. After this message is sent, the next time when we connect to stream we will not get messages before that LSN.
We had a discutssion (@karolisg @mediuminvader @chubei). We believe:
- Watermark-based snapshot reading cannot guarentee data consistency if replication slot is used by other processes during Dozer restart.
- We should send
standby_status_updatewith the LSN before the first record, or not sending at all if postgres can keep the replication slot alive. - We can let the user create replication slot and persist snapshot data, asking them to guarentee consistency.