airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Add dataset event dataset dag run queue association

Open sunank200 opened this issue 1 year ago • 1 comments


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

sunank200 avatar Feb 14 '24 17:02 sunank200

Notes from talking to Ankit off-thread:

  1. I think adding an association table shouldn’t affect triggering_dataset_events. SQLA loads relationships lazily (unless we make it; we don’t) so the new relation shouldn’t be loaded at all unless the user accesses it. They shouldn’t (it’s unsupported) but if they do they get an unavoidable performance penalty.
  2. Right now we pass in all triggered events collected by DDRQ during the prior trigger and the current trigger to the downstream timetable, and let it come up with an appropriate data interval for the downstream DAG run. The logic is pretty obvious for ALL (default, current logic), but less so for ANY or anything more complicated. We might need a way for users to override that timetable function to generate a more appropriate data interval, but that will be handled in the future when the need comes up.

uranusjr avatar Feb 20 '24 06:02 uranusjr

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Apr 06 '24 00:04 github-actions[bot]