replicator CDC Resolver vs renamed tables

CDC Resolver vs renamed tables

Open bobvawter opened this issue 1 year ago • 0 comments

In scenarios where the target schema does not define tables of the same names used in an incoming CDC feed (e.g. if using a dispatch function in the userscript that renames a table), the cdc resolver loop doesn't know to look for a staging table of the (incoming) original table name.

Scenario:

The incoming CDC feed will stage data to target_public_original_name.
A resolved timestamp is received.
The CDC resolver looks up staging tables, using tables defined in the target schema to bootstrap the process.
Since the target schema has renamed_table instead of original_name, the resolver loop is unaware of the staging table created above.

We should consider that the staging tables are created on demand based on incoming changefeed requests, which may be received by a separate cdc-sink instance, so iterating over the staging tables is subject to a race condition.

A workaround that exists today is to create an empty table in the target database that uses the original name. This will ensure that the resolver is aware of the original_name staging table.

If this becomes a common issue, we could allow the resolver to be seeded with additional table names, either via a CLI flag and/or the userscript.

Oct 05 '23 15:10 bobvawter

replicator replicator copied to clipboard

CDC Resolver vs renamed tables

replicator
replicator copied to clipboard