Queries stuck in FINISHING time
Expected behavior
Queries' results should be successfully recieved to the client.
Actual behavior
In our Trino cluster, we are facing an issur that some queries remain stuck in the FINISHING state for an extended period before eventually failing with the error message: "Query was abandoned by the client, as it may have exited or stopped checking for query results."
After conducting some investigation, it appears that this issue predominantly occurs when querying Trino using the Python client. Here's a breakdown of the observed flow:
- In the main module, we execute the
TrinoQuery.executefunction with our query. - This function initiates a POST request to the Trino coordinator.
- Subsequently, it sends a GET request to the
nextUrito retrieve the initial batch of query results. - As the results start arriving, the query state transitions to FINISHING.
- The execution of the
executefunction ends. - Following this, the
cursor.fetchall()function in the main module iterates over thenextUris, yielding each received row to the client. However, after a certain duration of fetching query results, the query fails with the "query abandon" error (as mentioned above).
Any assistance on resolving this significant issue would be greatly appreciated.
Thank you!!
Steps To Reproduce
-
Is it advisable to incorporate heartbeats to the coordinator while fetching results?
-
Would it be feasible to fetch multiple
nextUrisin parallel? I'm uncertain about this possibility due to the need to accessnextUrisas a linked list.
Log output
No response
Operating System
Windows
Trino Python client version
0.326.0
Trino Server version
439
Python version
3.9.3
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
@hashhar Is there any progress on it? We also face the same issue
This is hard to reproduce and unclear if the causes for your case and our reproduction is same.
So we plan to add additional debug logging and then when someone is able to reproduce this issue we can look at the logs to figure out what is going wrong.
Probably here - https://github.com/trinodb/trino-python-client/blob/a87566794d9a9eefdd481a95f001ce2e37e20531/trino/client.py#L846C1-L846C65
Might be related or not - https://github.com/trinodb/trino/issues/22989#issuecomment-2493319045
There's debug logs from the client there + matching server logs.
Hello!
Any news or workarounds?
No, we've been unable to reproduce this and figure out the reason.