Datasets - trigger DAG run when any dataset is updated
Description
On this doc page, Airflow explicitly says all datasets need to be updated before a DAG runs is triggered.
When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.
I would like to see the schedule param logic configurable so that all or any dataset being updated triggers a DAG run.
Use case/motivation
Configurable so that a DAG run is triggered when either all or any dataset is updated.
To keep the API consistent, code changes would keep all as the default functionality.
Related issues
Tangential, but not directly related
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
I think so We need it partial Aware Scheduling here. Let say we are dependent on 3 dataset. Let say we update each dataset every hour. And we want one dataset should be partial dataset. Like lets say it was updated last if updated upto 6 hours ago then it is fine go ahead and trigger the dag. Otherwise some callback.
It will be a great feature
(in case it helps) Directly related: closed PR - https://github.com/apache/airflow/pull/28333 discussion - https://github.com/apache/airflow/discussions/28253
@sunank200 and I are working on this right now actually
@dstandish has there been on movement on this issue? I will be following closely.
any updates here? looking forward for this improvement!
Yes there's a number of PRs that have been merged for this but this is the one that show you the syntax https://github.com/apache/airflow/pull/37101
This feature is available since Airflow 2.9, see https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html#logical-operators-for-datasets