airflow
airflow copied to clipboard
Deferrable ExternalTaskSensor cannot wait for an entire DAG
Apache Airflow version
2.7.1
What happened
When running an ExternalTaskSensor with external_task_id=None and in deferrable mode, the trigger doesn't wait for the entire DAG since it needs a task_id. However, the typing suggests this should be possible: https://github.com/apache/airflow/blob/main/airflow/triggers/external_task.py#L63
I believe this is the query that doesn't behave as expected from the typing: https://github.com/apache/airflow/blob/main/airflow/triggers/external_task.py#L135
Moreover, after examining the code, it seems the trigger would also fail if the ExternalTaskSensor was provided with an external_task_ids (multiple tasks).
What you think should happen instead
No response
How to reproduce
- Create
dag1which contains a simple task (e.g. sleep for 60 seconds) - Create
dag2which contains an ExternalTaskSensor checking fordag1(external_task_id=None) and runs in deferrable mode - Run them and see how
dag2times out because it can't find any task.
Operating System
Ubuntu 22.04.3 LTS
Versions of Apache Airflow Providers
No response
Deployment
Virtualenv installation
Deployment details
No response
Anything else
Similarly to #34204 I'd love to submit a PR, but have no available time to properly do so or the knowledge to make sure any of my solutions actually work in all cases.
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I'm interested in taking this issue
I think this file causes the issue? I see dag2 in status failed - is this a valid way to reproduce the issue @ecodina ? I'm new to airflow so thanks for your help. How do you know it's timing out instead of just failing?
Yes, this file reproduces the issue. You can try lowering the sleep time (e.g. 10 seconds). You'll then see that the ExternalTask keeps logging Task is still running, sleeping for 2.0 seconds... after the sleeping task has long finished. It fails after the trigger's 60 seconds hardcoded timeout.
You can also print the result of TaskStateTrigger.count_tasks and check it's always 0.
I opened 3 issues regarding the deferrable ExternalTaskSensor and as @yermalov-here commented in #34204 the problem is that the logic of this deferrable trigger is different from the "normal" one. Looking in retrospective, maybe it'd have been better to open 1 issue with the 3 "bugs" since a total refactor may be needed!
Should we consider consolidating the issues? The conversation in #34204 is much more thorough and the fix should address this problem too
Sorry, OOO until October and I just saw your message. It's probably a good idea to consolidate all the issues! I think they all share the same root-cause.
Hi there! I just hit into this too - would it be possible to add a note that this isn't yet supported in the meantime? It seems like the Triggerer code for this operator was never set up for a full DAG, for multiple task instance IDs, nor for a full task group, which the non-deferrable sensor is able to support. In my case, we switched a bunch of our sensors over to deferrable=True and several ExternalTaskSensors that expected to sense a whole DAG ended up failing for (initially) unclear reasons, first saying they were waiting for a task to finish, then failing saying that the DAG doesn't exist, both of which seem incorrect in this case.
I don't think I'm familiar enough with the Airflow code to fix this all myself, but I could submit an MR for the meantime that this isn't yet functional, so other people are aware of it!
Hi there! I just hit into this too - would it be possible to add a note that this isn't yet supported in the meantime?
Sure - it's marked as "good-first-issue" so anyone (including yourself) can add a note in our docs. It's super-easy. Find the doc you want to add it it in, click "Suggest a change on this page" and PR will be opene where you (or whoever does it) can add appropriate comment - and maintainers will comment on it and help you (or anyone else) to merge it. No big familiarity is needed and you are likely one of the best people in the world to know where to add it and how to formulate it so that people like you would make a good use of it.
Can we count on your help there @Throne3d ?
I've opened up https://github.com/apache/airflow/pull/35854 to add a note here on the sensor page