Add robust dry run capability for backfill
Body
Child of parent issue https://github.com/apache/airflow/issues/43970
As a user, you want to be able to dry run the backfill creation process from the UI. E.g. i click "create backfill" and give it a range, then I want, in the UI, to be able to see the runs that will be created if I click "submit".
In order to do this, we'll have to refactor the backfill creation process a bit. Right now, we just submit a range, and the backfill endpoint will just create the backfill object and all of the runs.
One of the problems with the idea of implementing dry run is, suppose we return "these runs will be created; proceed?". Well what if the scheduler schedules, or a user clears or deletes, a run in the range. Then we would not end up doing exactly what we said we were going to do.
So what we need to do is somehow, implement in the API the ability to get some representation of the entirety of the backfill -- the object and its runs -- and then the user could submit that back to another endpoint which would just receive this payload and attempt to create it. In this second endpoint which is essentially "take the payload and create", we wolud first lock the dag and then attempt to insert all the rows. And if we find a conflict, we should abandon the whole try and tell the user, sorry, something changed, we got a conflict, please try again. There's a 409 Conflict API response that would seem to be appropriate here.
cc @phanikumv @jedcunningham @bbovenzi @pierrejeambrun
Committer
- [X] I acknowledge that I am a maintainer/committer of the Apache Airflow project.
This makes sense to me. I think it's very important for users to know exactly what they're about to change.
We can make sure the UI specifically handles the 409 response in the create backfill flow.
Assigning to @prabhusneha
After further discussions with @dstandish, we decided to adopt an approach aligned with the current CLI dry run functionality, rather than implementing a two stage process. Specifically, when a user requests a dry run of the backfill, the response will include only the DAG runs that will actually be created. Any DAG runs that would not be created based on the specified reprocess_behavior will be excluded from the dry run response.