airflow
airflow copied to clipboard
Audit task failed-deps for later investigation
Body
Currently Airflow offers failed-deps to investigate why task isn't being scheduled. This is very helpful tool however it works only in real time according to the current entries in the metadb. Investigating past anomalies isn't supported.
Sometimes scheduling problems are "solved" on their own. It could be that pool is overcrowded or concurrency has been reached but eventually stress is reduced and tasks are scheduled, thus when you notice it and want to investigate why there was a delay to begin with your capabilities are limited as there could be many reasons.
The needed solution:
We should investigate the option to audit the failed-deps information or alternatively offer an easy way to export this information in real time to an external audit storage for later investigation.
Committer
- [X] I acknowledge that I am a maintainer/committer of the Apache Airflow project.
Recently I worked on this and the information is available as part of UI and API for tasks in scheduled or None state. Perhaps the API could be used for export and also enriched with additional checks that provide useful information.
Ref : https://github.com/apache/airflow/pull/38449
Recently I worked on this and the information is available as part of UI and API for tasks in scheduled or None state. Perhaps the API could be used for export and also enriched with additional checks that provide useful information.
Ref : #38449
The UI part is exposing failed-deps as is. It doesnt have the mechanism to export/store the information.
There is also the question of export interval