materialize icon indicating copy to clipboard operation
materialize copied to clipboard

storage/sources/postgres: support `REPLICA IDENTIFY DEFAULT`

Open petrosagg opened this issue 2 years ago • 3 comments

We currently require that all tables participating in a publication have their REPLICA IDENTITY set to FULL. This is because we want to be able to ingest data with minimal in-memory state and rely on the upstream database to provide the previous data of a changed row in the case of retractions.

REPLICA IDENTITY FULL however increases the amount of data that transits the upstream's WAL and that might be operationally undesirable. Materialize already knows how to deal with key-based retractions through the UPSERT envelope so we should be able to support replicating postgres tables with REPLICA IDENTITY DEFAULT by passing them through an UPSERT operator, ideally automatically.

From the list of replicated tables we can determine which ones need an UPSERT state during purification so that we only render it when it's necessary.

petrosagg avatar Mar 06 '23 12:03 petrosagg

Looks like we can close #15874 in favour of this one.

morsapaes avatar Mar 06 '23 13:03 morsapaes

From the list of replicated tables we can determine which ones need an UPSERT state during purification so that we only render it when it's necessary.

A bit of danger here, though, in that we only fetch the metadata about those tables when the source is created, right? We should clearly specify the behavior in the docs. Running ALTER REPLICA IDENTITY in PostgreSQL after you've created the source in Materialize is not going to work, right?

benesch avatar Mar 06 '23 14:03 benesch

That's right, it needs to stay the same. Fortunately this is detectable so there isn't a correctness ranger here, only UX danger which we should make clear in the docs

petrosagg avatar Mar 06 '23 14:03 petrosagg