airflow
airflow copied to clipboard
The grid view shows incorrect DAG runs
Apache Airflow version
2.4.0
What happened
Airflow 2.4.0's grid view has an issue:
Grid view stop showing the latest DAG runs in 2.4.0 (regardless of schedule, it shows old dag runs)
-
case 1 (all 25 are old dag runs)

-
case 2 (old and new dag runs are mixed)

What you think should happen instead
No response
How to reproduce
- Go to a DAG with >365 dag runs
- The place where it's supposed to show the latest dag run in the "grid view" now shows an old dag run ~1 year ago
Operating System
Linux Mint 20.3 Una
Versions of Apache Airflow Providers
No response
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
most likely due to https://github.com/apache/airflow/pull/25633 and https://github.com/apache/airflow/pull/25880 @uranusjr @bbovenzi any idea how to solve this? i'm not familiar with the "timetable" part :sweat_smile:
Hi there! Follow up from dev ML.
We faced exactly the same issue after upgrade from 2.3.4 to 2.4.0 for all our dags. All of them has quite a history up to 2016 year, so a lot of dag runs. Somehow grid show for us latest runs at 2021.11 while other views (calendar, graph) shows all of them and the most recent ones.
The exact fix which solved this problem is:
--- airflow/www/views.py
+++ airflow/www/views.py
@@ -3453,8 +3460,9 @@ class Airflow(AirflowBaseView):
if run_state:
query = query.filter(DagRun.state == run_state)
- ordering = (DagRun.__table__.columns[name].desc() for name in dag.timetable.run_ordering)
- dag_runs = query.order_by(*ordering, DagRun.id.desc()).limit(num_runs).all()
+ dag_runs = query.order_by(DagRun.execution_date.desc()).limit(num_runs).all()
dag_runs.reverse()
encoded_runs = [wwwutils.encode_dag_run(dr) for dr in dag_runs]
It feels like there is a problem between data and logic that operates it, but not sure where and what is it.
@uranusjr Can you take a look at this ASAP please so we can get a fix for this out in 2.4.1?
@kxepal What is the timetable/schedule parameter for the DAG(s) with this problem?
@ashb we have good old timedelta schedule interval value everywhere. No timetables at all.
@kxepal doesn't the fix actually revert the core of https://github.com/apache/airflow/pull/25633? i also did that because i "accidentally" deployed airflow 2.4.0 to our production :shushing_face:
@zachliu That PR contains a lot of changes, but for us was enough to revert those changes that in the diff. The rest of them causes no troubles.
@kxepal taking a wild guess here, maybe a if can solve all: if timetables are used, do that; else, do this
@zachliu probably. We don't use timetables, so it was clear fix for us.
@kxepal same here, i'm waiting for the distant future "calendar"-like scheduling system :wink:
Everything is converted to a timetable now under the hood (and had been since timetables were introduced) so it's not as simple as an if.
aww... wrong guess :crying_cat_face: it was based on the fact that some old dag runs don't have the Data interval start/end parameters (see case 2 screenshots)
@ashb What information could be useful to debug this issue?
@kxepal Creating a minimal stand-alone reproduction case if you can ("here's a dag, run it/backfill it for X" sort of thing)
@kxepal Creating a minimal stand-alone reproduction case if you can ("here's a dag, run it/backfill it for X" sort of thing)
i tried that, running the same dag many times in a short period (backfilling) doesn't help, they will be ordered properly. i guess because they are all converted to timetables uniformly
it seems the issue only occurs on dags that are older than the introduction of timetable, ~which suggests that during the conversion, older dag runs lack attribute(s) for a proper sorting~ as i mentioned before, for older dag runs whose data_interval_end is NULL, the order_by(*ordering, DagRun.id.desc()) doesn't work
it's basically
ordering = (DagRun.__table__.columns[name].desc() for name in dag.timetable.run_ordering)
dag_runs = query.order_by(*ordering, DagRun.id.desc()).limit(num_runs).all()
SELECT dag_id,
run_id,
data_interval_end
FROM dag_run
WHERE dag_id = '<dag_name>'
ORDER BY data_interval_end DESC,
execution_date DESC
LIMIT 25;
vs
dag_runs = query.order_by(DagRun.execution_date.desc()).limit(num_runs).all()
SELECT dag_id,
run_id,
data_interval_end
FROM dag_run
WHERE dag_id = '<dag_name>'
ORDER BY execution_date DESC
LIMIT 25;
Creating a minimal stand-alone reproduction case if you can ("here's a dag, run it/backfill it for X" sort of thing)
Oblivious, but it be quite a hard to provide. @zachliu feels like walk on this path with no luck. But it looks like the problem could be solved by a simple new migration script. But what the cases it should handle and how it should fix them?
can the migration script simply fills all the NULL data_interval_end with execution_date?
But why it already doesn't? It doesn't feels like a problem with current logic - it works fine, but somewhere in changes between two versions. Actually, running new dagruns with 2.4 without a fix doesn't change anything.
Filling all data interval with execution date would be excruciatingly slow, which is why it’s not done. Doing some fancy coalesce for ordering is probably possible.
I created a PR for this. Would be awesome if someone could test it!
Fixed by #26626