jitsu
jitsu copied to clipboard
Atomically drop table on full re-sync and replication-method is FULL_TABLE
Problem
When Jitsu does full re-sync in sources synchronization, it truncates the destination table before writing data. Since the source's discovered catalog might be changed between syncs and destination schema (e.g. data types) won't be the same in different source versions, Jitsu should drop tables on full re-sync, and when a stream has replication-method = FULL_TABLE.
Dropping tables should be done with atomicity. The pipeline should be:
- Jitsu should write data into a new table with
_newsuffix - rename old table by adding suffix
_old - rename new table by removing suffix
_new - drop old table (with suffix
_old)
Solution
Part 1:
- [ ] Replace table truncating with table dropping in
/clear_cacheendpoint with full-resync flag and if a stream has replication-method = FULL_TABLE.
Part 2:
- [ ] Remove
/clear_cachecall from UI on full-refresh button - [ ] Add call
/tasks(like in incremental sync) but with POST JSON body parameterfull_resync=trueto UI - [ ] Jitsu Server should save
full_resyncinto the Task body in Redis and use this parameter for executing the above pipeline.