airflow icon indicating copy to clipboard operation
airflow copied to clipboard

run_after not respected when last DagRun is scheduled but not executed

Open awenest opened this issue 7 months ago • 1 comments

Description

When using a custom Timetable that specifies run_after later than the data interval end (e.g. scheduling at 04:00 UTC instead of midnight), Airflow’s scheduler may issue the next interval prematurely if the previous run hasn’t yet started. This leads to: • Skipped intervals • Misaligned scheduling

Steps to Reproduce 1. Setup a Timetable that returns:

 DagRunInfo(
    data_interval=DataInterval(start, end),
    run_after=DateTime.combine(end.date(), schedule_at).replace(tzinfo=UTC),
)

where schedule_at = Time(4, 0)

2.	On day X, at 00:00 UTC, the scheduler calls next_dagrun_info(), sees no prior run, and emits interval A (X–X+1 day) with run_after = X+1 04:00 UTC
3.	Since A hasn’t triggered yet, last_automated_data_interval is still None. The scheduler calls next_dagrun_info() again (before 04:00 UTC), again thinking “no prior run,” and emits interval B (X+1–X+2 day)
4.	As a result, interval A is skipped forever

Expected Behavior

Airflow should: • Not emit a new interval before the prior interval’s run_after has passed, even if the prior run hasn’t executed yet

Or expose to the timetable logic: • The timestamp of the last scheduled but not necessarily executed DagRun, so custom logic can guard against premature scheduling

Impact

This bug causes custom daily schedules using non-midnight run times to lose a day of data and/or misalign downstream processes, requiring developers to implement brittle workarounds.

Proposed Fixes

Hold off calling next_dagrun_info() again until the last scheduled run’s run_after has passed. Even if last_automated_data_interval is still None, recognize that an interval has been scheduled. 2. Timetable interface improvement Expose an argument like last_scheduled_run_after to next_dagrun_info(), so custom timetables can make informed decisions. 3. Documentation update Make clear in Timetable guide that: If run_after differs from data_interval.end, timetable authors must guard against re-emission of intervals before the previous run_after has passed.

References • Timetable documentation: https://airflow.apache.org/docs/apache-airflow/stable/timetables.html

awenest avatar Jun 17 '25 13:06 awenest

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

boring-cyborg[bot] avatar Jun 17 '25 13:06 boring-cyborg[bot]