dlt [WIP] allow to fork and split pipelines

[WIP] allow to fork and split pipelines

Open rudolfix opened this issue 8 months ago • 0 comments

Background split pipeline - create pipeline with a new name and move indicated sources/resources/tables to it. extracted and normalized files belonging to resources' tables are moved, schemas and state are split and moved fork pipeline - as above but create a copy (or hard link) of files, schemas and state are split and copied

To fully implement "bad data handling" (#780 ) we need to be able to split pipeline on bad data resource / table and load it to a separate dataset or a destination

For reverse ETL (custom destinations) we want to send some resources twice (or more) to several destinations.

A full fork may be also used as a backup #944

the process Pipeline may be split after extract or normalize steps. We should not allow to split/fork pipelines that have

all packages in working directory must be split / forked

We should not allow split/fork when there are still packages in the load phase (in "original" and "other" pipeline)

state and schema splitting The "original" pipeline should keep the full state and schema to be able to restore it. Tables for which resources do not exist will not be created. The "other" pipeline should receive only source state and state belonging to the split/forked resources. Package state may be cloned (we need to make sure that new refresh work correctly though)

note 1: pipeline state mutates only during extract step so we are fine to override sources/resources state on split/fork when state already exists. note 2: schemas mutate during extract and normalize steps. if the pipeline was cloned after extract step, we just initialize the schemas in the "other" pipeline. if the pipeline was cloned after normalize step, we can overwrite (of course source-wise)

user interface split and fork methods on a Pipeline. They should accept "other" pipeline name, and optionally: destination/staging and dataset name.

Jun 02 '24 20:06 rudolfix

dlt dlt copied to clipboard

[WIP] allow to fork and split pipelines

dlt
dlt copied to clipboard