airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Deferrable ExternalTaskSensor cannot wait for an entire DAG

Open ecodina opened this issue 2 years ago • 8 comments

Apache Airflow version

2.7.1

What happened

When running an ExternalTaskSensor with external_task_id=None and in deferrable mode, the trigger doesn't wait for the entire DAG since it needs a task_id. However, the typing suggests this should be possible: https://github.com/apache/airflow/blob/main/airflow/triggers/external_task.py#L63

I believe this is the query that doesn't behave as expected from the typing: https://github.com/apache/airflow/blob/main/airflow/triggers/external_task.py#L135

Moreover, after examining the code, it seems the trigger would also fail if the ExternalTaskSensor was provided with an external_task_ids (multiple tasks).

What you think should happen instead

No response

How to reproduce

  1. Create dag1 which contains a simple task (e.g. sleep for 60 seconds)
  2. Create dag2 which contains an ExternalTaskSensor checking for dag1 (external_task_id=None) and runs in deferrable mode
  3. Run them and see how dag2 times out because it can't find any task.

Operating System

Ubuntu 22.04.3 LTS

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else

Similarly to #34204 I'd love to submit a PR, but have no available time to properly do so or the knowledge to make sure any of my solutions actually work in all cases.

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

ecodina avatar Sep 08 '23 10:09 ecodina

I'm interested in taking this issue

samc1213 avatar Sep 13 '23 23:09 samc1213

I think this file causes the issue? I see dag2 in status failed - is this a valid way to reproduce the issue @ecodina ? I'm new to airflow so thanks for your help. How do you know it's timing out instead of just failing?

samc1213 avatar Sep 14 '23 15:09 samc1213

Yes, this file reproduces the issue. You can try lowering the sleep time (e.g. 10 seconds). You'll then see that the ExternalTask keeps logging Task is still running, sleeping for 2.0 seconds... after the sleeping task has long finished. It fails after the trigger's 60 seconds hardcoded timeout.

You can also print the result of TaskStateTrigger.count_tasks and check it's always 0.

I opened 3 issues regarding the deferrable ExternalTaskSensor and as @yermalov-here commented in #34204 the problem is that the logic of this deferrable trigger is different from the "normal" one. Looking in retrospective, maybe it'd have been better to open 1 issue with the 3 "bugs" since a total refactor may be needed!

ecodina avatar Sep 14 '23 17:09 ecodina

Should we consider consolidating the issues? The conversation in #34204 is much more thorough and the fix should address this problem too

samc1213 avatar Sep 14 '23 17:09 samc1213

Sorry, OOO until October and I just saw your message. It's probably a good idea to consolidate all the issues! I think they all share the same root-cause.

ecodina avatar Sep 16 '23 14:09 ecodina

Hi there! I just hit into this too - would it be possible to add a note that this isn't yet supported in the meantime? It seems like the Triggerer code for this operator was never set up for a full DAG, for multiple task instance IDs, nor for a full task group, which the non-deferrable sensor is able to support. In my case, we switched a bunch of our sensors over to deferrable=True and several ExternalTaskSensors that expected to sense a whole DAG ended up failing for (initially) unclear reasons, first saying they were waiting for a task to finish, then failing saying that the DAG doesn't exist, both of which seem incorrect in this case.

I don't think I'm familiar enough with the Airflow code to fix this all myself, but I could submit an MR for the meantime that this isn't yet functional, so other people are aware of it!

Throne3d avatar Nov 24 '23 04:11 Throne3d

Hi there! I just hit into this too - would it be possible to add a note that this isn't yet supported in the meantime?

Sure - it's marked as "good-first-issue" so anyone (including yourself) can add a note in our docs. It's super-easy. Find the doc you want to add it it in, click "Suggest a change on this page" and PR will be opene where you (or whoever does it) can add appropriate comment - and maintainers will comment on it and help you (or anyone else) to merge it. No big familiarity is needed and you are likely one of the best people in the world to know where to add it and how to formulate it so that people like you would make a good use of it.

Can we count on your help there @Throne3d ?

potiuk avatar Nov 25 '23 22:11 potiuk

I've opened up https://github.com/apache/airflow/pull/35854 to add a note here on the sensor page

Throne3d avatar Nov 25 '23 23:11 Throne3d