dlt use a dummy operator at the start of parallel pipelines

Description

Used a DummyOperator instead of the first source to parallelize all sources, including the first one.

Related Issues

Fixes #2196

Additional Context

The first source can take a long time to run, this can make pipelines faster by parallelizing even the first source.

Jan 08 '25 09:01 alucryd

Deploy Preview for dlt-hub-docs canceled.

Name	Link
Latest commit	7fc82a3fce1d19f9f2b8fda0edcfbb0095f661cd
Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/677e7bfdb52f8300086fd4d2

Jan 08 '25 09:01 netlify[bot]

@alucryd there's a reason to run first task and then all others in parallel: it will create initial schema in the database and standard dlt tables. all tasks share the same dataset.

if you still want to work on this PR then let's add new option to add_run: ie dummy_task_first and if set to True, do what do right now.

I do not want to change existing behavior, too many deployments that may rely on that are in production

I see, thanks for the heads up, I don't have the full picture yet but I'm getting there. I ran this change in production and didn't run into any issue with a completely new datasource so I wrongly assumed it would be harmless.

I assume it would be too much work to split the schema and table creations and only run that in the first task?

In any case I'll add the proposed option and default it to false so it doesn't impact anyone.

Jan 27 '25 17:01 alucryd

@alucryd yeah we could think of some "preparatory" task but IMO in that case it is better to just create a callback that receives a DAG from airflow helper and can modify it... we already have on_before_run we could also add on_dag_created where you get this tree of tasks.

but that's a separate ticket I'd say - if you'd like to try to add it

Jan 29 '25 17:01 rudolfix

@alucryd do you plan to continue on this?

Feb 17 '25 12:02 rudolfix

Hi, @rudolfix @alucryd I would like to give this a shot, if it is alright :)

Mar 24 '25 20:03 prakharcode

@prakharcode I will close this PR for no activity, if you'd like to continue it or provide a new one, please re-open this or create a new PR. Thanks :)

May 19 '25 13:05 sh-rp

use a dummy operator at the start of parallel pipelines

Description

Related Issues

Additional Context

✅ Deploy Preview for dlt-hub-docs canceled.

Deploy Preview for dlt-hub-docs canceled.