airflow
airflow copied to clipboard
Duplicate entries in API response when TaskInstanceHistory and TaskInstance have same maximum try number
Apache Airflow version
main (development)
If "Other Airflow 2 version" selected, which one?
No response
What happened?
While trying out a task with high number of retries I noticed the issue where there are duplicate entries for task tries sometimes but eventually resolves it by itself. I noticed the following query where TaskInstanceHistory and TaskInstance entry is combined. There could be a case where the max try_number of TaskInstanceHistory entries and TaskInstance's try_number are the same thus leading to the duplicate entries in the latest try.
https://github.com/apache/airflow/blob/79db243d03cc4406290597ad400ab0f514975c79/airflow/api_connexion/endpoints/task_instance_endpoint.py#L863-L872
What you think should happen instead?
No response
How to reproduce
- Setup a dag with high number of retries.
- Notice occassionally the below scenario during API calls with duplicate response for the last try number.
Operating System
Ubuntu
Versions of Apache Airflow Providers
No response
Deployment
Virtualenv installation
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
cc: @ephraimbuddy @bbovenzi
Ahh this makes sense. I think when try_number is the same, we should only send the TI entry. and ignore the TIH entry.
Both 2.10.2 and 2.10.3 have this issue. And you don't need a high number of retries. As long as you have retries != 0, you'll see duplicated entries
2.10.2: https://github.com/apache/airflow/blob/35087d7d10714130cc3e9e9730e34b07fc56938d/airflow/api_connexion/endpoints/task_instance_endpoint.py#L833-L842
2.10.3: https://github.com/apache/airflow/blob/c99887ec11ce3e1a43f2794fcf36d27555140f00/airflow/api_connexion/endpoints/task_instance_endpoint.py#L834-L843
Yeah I saw that one too during the Man's Hackathon.