airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Datasets - trigger DAG run when any dataset is updated

Open westonplatter opened this issue 2 years ago • 5 comments

Description

On this doc page, Airflow explicitly says all datasets need to be updated before a DAG runs is triggered.

When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.

I would like to see the schedule param logic configurable so that all or any dataset being updated triggers a DAG run.

Use case/motivation

Configurable so that a DAG run is triggered when either all or any dataset is updated.

To keep the API consistent, code changes would keep all as the default functionality.

Related issues

Tangential, but not directly related

Are you willing to submit a PR?

  • [x] Yes I am willing to submit a PR!

Code of Conduct

westonplatter avatar Sep 21 '23 21:09 westonplatter

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Sep 21 '23 21:09 boring-cyborg[bot]

I think so We need it partial Aware Scheduling here. Let say we are dependent on 3 dataset. Let say we update each dataset every hour. And we want one dataset should be partial dataset. Like lets say it was updated last if updated upto 6 hours ago then it is fine go ahead and trigger the dag. Otherwise some callback.

It will be a great feature

Amar1404 avatar Sep 22 '23 12:09 Amar1404

(in case it helps) Directly related: closed PR - https://github.com/apache/airflow/pull/28333 discussion - https://github.com/apache/airflow/discussions/28253

yermalov-here avatar Sep 26 '23 12:09 yermalov-here

@sunank200 and I are working on this right now actually

dstandish avatar Jan 25 '24 15:01 dstandish

@dstandish has there been on movement on this issue? I will be following closely.

harveymarshall avatar Feb 21 '24 16:02 harveymarshall

any updates here? looking forward for this improvement!

gabrielrmn avatar Mar 01 '24 16:03 gabrielrmn

Yes there's a number of PRs that have been merged for this but this is the one that show you the syntax https://github.com/apache/airflow/pull/37101

dstandish avatar Mar 01 '24 17:03 dstandish

This feature is available since Airflow 2.9, see https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html#logical-operators-for-datasets

jscheffl avatar Jul 25 '24 21:07 jscheffl