dlt icon indicating copy to clipboard operation
dlt copied to clipboard

use a dummy operator at the start of parallel pipelines

Open alucryd opened this issue 1 year ago • 5 comments

Description

Used a DummyOperator instead of the first source to parallelize all sources, including the first one.

Related Issues

  • Fixes #2196

Additional Context

The first source can take a long time to run, this can make pipelines faster by parallelizing even the first source.

alucryd avatar Jan 08 '25 09:01 alucryd

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 7fc82a3fce1d19f9f2b8fda0edcfbb0095f661cd
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/677e7bfdb52f8300086fd4d2

netlify[bot] avatar Jan 08 '25 09:01 netlify[bot]

@alucryd there's a reason to run first task and then all others in parallel: it will create initial schema in the database and standard dlt tables. all tasks share the same dataset.

if you still want to work on this PR then let's add new option to add_run: ie dummy_task_first and if set to True, do what do right now.

I do not want to change existing behavior, too many deployments that may rely on that are in production

I see, thanks for the heads up, I don't have the full picture yet but I'm getting there. I ran this change in production and didn't run into any issue with a completely new datasource so I wrongly assumed it would be harmless.

I assume it would be too much work to split the schema and table creations and only run that in the first task?

In any case I'll add the proposed option and default it to false so it doesn't impact anyone.

alucryd avatar Jan 27 '25 17:01 alucryd

@alucryd yeah we could think of some "preparatory" task but IMO in that case it is better to just create a callback that receives a DAG from airflow helper and can modify it... we already have on_before_run we could also add on_dag_created where you get this tree of tasks.

but that's a separate ticket I'd say - if you'd like to try to add it

rudolfix avatar Jan 29 '25 17:01 rudolfix

@alucryd do you plan to continue on this?

rudolfix avatar Feb 17 '25 12:02 rudolfix

Hi, @rudolfix @alucryd I would like to give this a shot, if it is alright :)

prakharcode avatar Mar 24 '25 20:03 prakharcode

@prakharcode I will close this PR for no activity, if you'd like to continue it or provide a new one, please re-open this or create a new PR. Thanks :)

sh-rp avatar May 19 '25 13:05 sh-rp