Support state synching to separate destination

Open sh-rp opened this issue 1 year ago • 0 comments

Feature description

We want to support synching state and schema info to a destination separate from the destinations the load data is sent to. This will be specially useful for the custom destination, which does not sync any state at this point.

Implementation Tasks

[ ] Add the possibility to supply a destination config that will only be used for the state syncing
[ ] Extend "WithStateSync" interface to also have methods to set schema and state. Currently the former is done during "update_stored_schema", but should be triggered directly from load.py. The latter is done via load job (with the exception of the filesystem) and should be migrated to also be directly done via load.py. This will affect all destinations (in case of the sql destinations possibly only the baseclass)
[ ] Optionally Extend "WithStateSync" interface to query all states and schemas via a unified interface, this will make some of the tests easier to handle.
[ ] Make the drop command work for all destinations, this will require an method on the destination to drop or truncate tables.
[ ] Think about migration of the state load handling. This will only be an issue if the user upgrades dlt with an unfinished load package locally. Maybe it is enough to just warn and ask to downgrad to complete this one load. Not sure.
[ ] With the correct abstractions, we should get the filesystem to run all or most tests in test_job_client.

Other Ideas

[ ] Add dlt_load_id to version table for better lineage information.

Apr 17 '24 10:04 sh-rp