airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Introducing Logical Operators for dataset conditional logic

Open sunank200 opened this issue 1 year ago • 0 comments

We've expanded the dataset dependency handling in Airflow, building on PR apache/airflow#37016. This PR introduces the use of logical operators (| for OR and & for AND) to link Dataset instances, simplifying the expression of complex dependencies.

Key Enhancements:

  • Logical operators | and & for combining Dataset objects.
  • Enhanced URI validation within the Dataset class.
  • New DatasetAny and DatasetAll classes for OR/AND conditions.
  • DatasetsExpression class for building dataset condition trees.
  • extract_datasets function to interpret these expression trees.

Example:

dataset1 = Dataset(uri="s3://bucket1/data1")
dataset2 = Dataset(uri="s3://bucket2/data2")
dataset3 = Dataset(uri="s3://bucket3/data3")
dataset4 = Dataset(uri="s3://bucket4/data4")
dataset5 = Dataset(uri="s3://bucket5/data5")

expr = dataset1 | (dataset2 & dataset3)
# expr translates to DatasetAny(dataset1, DatasetAll(dataset2, dataset3)))

expr1 =  ((dataset1 & dataset2) | dataset3) & (dataset4 | dataset5)

# expr1 translates to DatasetAll(DatasetAny(DatasetAll(dataset1, dataset2), dataset3), DatasetAny(dataset4, dataset5))

This update offers a more intuitive way of expressing dataset dependencies in Airflow workflows.

Depends on the merge of PR apache/airflow#37016.

Dependency Checklist

  • [ ] PR #37016 should be merged before this PR.


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

sunank200 avatar Jan 30 '24 16:01 sunank200