peerdb icon indicating copy to clipboard operation
peerdb copied to clipboard

Proposal: Activity to fetch config from the DB

Open alexstoick opened this issue 2 months ago • 2 comments

This uses a new helper in internal.FetchConfigFromDB to fetch a fully hydrated protos.FlowConnectionConfigs. When passing this config to Temporal we strip the tableMappings array element which can cause it to go over the 2MB limit.

This contains a subset of the changes which were originally proposed in: https://github.com/PeerDB-io/peerdb/pull/3407

alexstoick avatar Oct 10 '25 16:10 alexstoick

Some ideas on how to tighten this up:

  • Remove the field from FlowConnectionConfigs (mark as reserved) to rip off the band-aid and have the type system working with us. For table additions pass the mappings as a separate arg.
  • Add a table_mappings table to catalog, write there on flow creation in handler.go, add a migration after pause like in cdc_flow.go lines 375-390 from #2090 which pulled off a similar thing. Instead of fetching config from DB, fetch table mappings from DB.
  • Preserve replayability, which can be done by adding aversion bigint field that increments within a flow, referencing the value from FlowConnectionConfigs and making the mappings table be append-only.

Would be nice to consider how to not to trade off too much observability that Temporal is providing - right now for workflows with 1-10 tables it's convenient to see it right there in the execution. PeerDB UI and raw catalog access is not easily available in ClickPipes environment. Maybe log up to 100 tables (along with count and version) every time they're getting fetched?

ilidemi avatar Oct 11 '25 08:10 ilidemi

@ilidemi - Thanks for the feedback!

I hesitated in removing the field - as it does feel like a nuclear option - and I was worried about the migration path for running envs. I'll go ahead and implement your suggestion for the new table & taking a similar approach to #2090

In my previous PR: #3407 I was able to fully remove the options argument being passed which also reduced the size of the blob passed around - what are your thoughts on that? The options didn't serve any purpose once I was fetching the TableMappings and SrcTableIdNameMapping from the DB.


For adding/removing table mappings - I think we can either do:

  • a different activity
  • pass as an additional param

My only concern with changing the signature of this is again dealing with running systems and queued jobs - not sure how well Temporal behaves when you change the signature of the job!

alexstoick avatar Oct 13 '25 10:10 alexstoick