replicator
replicator copied to clipboard
CDC Resolver vs renamed tables
In scenarios where the target schema does not define tables of the same names used in an incoming CDC feed (e.g. if using a dispatch
function in the userscript that renames a table), the cdc resolver loop doesn't know to look for a staging table of the (incoming) original table name.
Scenario:
- The incoming CDC feed will stage data to
target_public_original_name
. - A resolved timestamp is received.
- The CDC resolver looks up staging tables, using tables defined in the target schema to bootstrap the process.
- Since the target schema has
renamed_table
instead oforiginal_name
, the resolver loop is unaware of the staging table created above.
We should consider that the staging tables are created on demand based on incoming changefeed requests, which may be received by a separate cdc-sink instance, so iterating over the staging tables is subject to a race condition.
A workaround that exists today is to create an empty table in the target database that uses the original name. This will ensure that the resolver is aware of the original_name
staging table.
If this becomes a common issue, we could allow the resolver to be seeded with additional table names, either via a CLI flag and/or the userscript.