jitsu icon indicating copy to clipboard operation
jitsu copied to clipboard

Atomically drop table on full re-sync and replication-method is FULL_TABLE

Open xtreding opened this issue 4 years ago • 0 comments

Problem

When Jitsu does full re-sync in sources synchronization, it truncates the destination table before writing data. Since the source's discovered catalog might be changed between syncs and destination schema (e.g. data types) won't be the same in different source versions, Jitsu should drop tables on full re-sync, and when a stream has replication-method = FULL_TABLE.

Dropping tables should be done with atomicity. The pipeline should be:

  • Jitsu should write data into a new table with _new suffix
  • rename old table by adding suffix _old
  • rename new table by removing suffix _new
  • drop old table (with suffix _old)

Solution

Part 1:

  • [ ] Replace table truncating with table dropping in /clear_cache endpoint with full-resync flag and if a stream has replication-method = FULL_TABLE.

Part 2:

  • [ ] Remove /clear_cache call from UI on full-refresh button
  • [ ] Add call /tasks (like in incremental sync) but with POST JSON body parameter full_resync=true to UI
  • [ ] Jitsu Server should save full_resync into the Task body in Redis and use this parameter for executing the above pipeline.

xtreding avatar Nov 17 '21 10:11 xtreding