Introducing Logical Operators for dataset conditional logic
We've expanded the dataset dependency handling in Airflow, building on PR apache/airflow#37016. This PR introduces the use of logical operators (| for OR and & for AND) to link Dataset instances, simplifying the expression of complex dependencies.
Key Enhancements:
- Logical operators
|and&for combiningDatasetobjects. - Enhanced URI validation within the
Datasetclass. - New
DatasetAnyandDatasetAllclasses for OR/AND conditions. -
DatasetsExpressionclass for building dataset condition trees. -
extract_datasetsfunction to interpret these expression trees.
Example:
dataset1 = Dataset(uri="s3://bucket1/data1")
dataset2 = Dataset(uri="s3://bucket2/data2")
dataset3 = Dataset(uri="s3://bucket3/data3")
dataset4 = Dataset(uri="s3://bucket4/data4")
dataset5 = Dataset(uri="s3://bucket5/data5")
expr = dataset1 | (dataset2 & dataset3)
# expr translates to DatasetAny(dataset1, DatasetAll(dataset2, dataset3)))
expr1 = ((dataset1 & dataset2) | dataset3) & (dataset4 | dataset5)
# expr1 translates to DatasetAll(DatasetAny(DatasetAll(dataset1, dataset2), dataset3), DatasetAny(dataset4, dataset5))
This update offers a more intuitive way of expressing dataset dependencies in Airflow workflows.
Depends on the merge of PR apache/airflow#37016.
Dependency Checklist
- [ ] PR #37016 should be merged before this PR.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.