airflow icon indicating copy to clipboard operation
airflow copied to clipboard

More natural sorting of DAG runs in the grid view

Open hterik opened this issue 2 years ago • 8 comments

Apache Airflow version

2.3.2

What happened

Dag with schedule to run once every hour. Dag was started manually at 12:44, lets call this run 1 At 13:00 the scheduled run started, lets call this run 2. It appears before run 1 in the grid view.

See attached screenshot image

What you think should happen instead

Dags in grid view should appear in the order they are started.

How to reproduce

No response

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow==2.3.2 apache-airflow-client==2.1.0 apache-airflow-providers-celery==3.0.0 apache-airflow-providers-cncf-kubernetes==4.0.2 apache-airflow-providers-docker==3.0.0 apache-airflow-providers-ftp==2.1.2 apache-airflow-providers-http==2.1.2 apache-airflow-providers-imap==2.2.3 apache-airflow-providers-postgres==5.0.0 apache-airflow-providers-sqlite==2.1.3

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

hterik avatar Jul 15 '22 11:07 hterik

We currently order the grid by execution_date, but that doesn't always line up with the start date nor data_interval_start.

@uranusjr Should we still use execution_date or should we change that to data_interval_start?

bbovenzi avatar Jul 15 '22 15:07 bbovenzi

data_interval_start is probably better than execution_date, but from what I read the issue, that would still be wrong and we should order by start_date instead (!)

This really depends what we think the runs should be ordered and I can easily see there are different answers for different people (or even the same person under different circumstances), so I guess data_interval_start is as good as any.

uranusjr avatar Jul 17 '22 20:07 uranusjr

The "Run" in the details section of the grid view defaults to data_interval_start and then uses execution_date as a backup. So I guess we should use that for the ordering here and even the tooltip. But, happy to hear a competing suggestion.

bbovenzi avatar Jul 18 '22 10:07 bbovenzi

Can we solve both by allowing users to choose the order layout? However I think start_date makes most sense in this view as its simple to understand. If using execution_date/ data_interval and mixing scheduled with manual runs it may be more complecated to understand?

eladkal avatar Jul 29 '22 05:07 eladkal

I think the data interval would be OK for mixed scheduled and manual runs since AIP-39 took extra care to align them across run types. execution_date would definitely be confusing (because it has different semantics between those types) but is probably OK as a backup. I’m cautious to using start_date since when the run starts doesn’t necessarily contain any logic, but perhaps data_interval_end could be better than _start? Not sure.

uranusjr avatar Jul 29 '22 06:07 uranusjr

I think either data_interval_start and data_interval_end are good.

Traditionally end woudl have been better but with the new "PlainCronTimetable" we are finally getting into the realm of "normal cron behaviour" (which opens up Airflow to number of different "logical" cases) and with it start seems much more natural.

Somehow I have a feeling that we should be able to choose (per DAG) whether start or end is used. Maybe that is the right approach? We could even attempt to detect which one is "default" for each dag based on timetable.

(Though as usual more options means more complexity).

potiuk avatar Jul 29 '22 10:07 potiuk

choose (per DAG) whether start or end is used

This is likely too much burden on the user; I imagine most users don’t actually have a good grasp on how the runs should be ordered and would be hard-pressed to choose a “correct” value.

But I wonder if making the logic per-timetable make sense, i.e. for some timetables the start makes more sense, and some the end, so a hook on the timetable class to return a “logical” date (pardon the name) to use in sorting algorithms seems like a reasonable approach.

uranusjr avatar Aug 01 '22 06:08 uranusjr

But I wonder if making the logic per-timetable make sense, i.e. for some timetables the start makes more sense, and some the end, so a hook on the timetable class to return a “logical” date (pardon the name) to use in sorting algorithms seems like a reasonable approach.

Yep. That would make perfect sense.

potiuk avatar Aug 01 '22 18:08 potiuk

Still need the UI implementation

uranusjr avatar Aug 18 '22 06:08 uranusjr