Flowclient hangs on running a complex query
When attempting to run a labelled_spatial_aggregate query consisting of 5140 query specs (counting the overall query and all nested sub-query specs), flowclient waits indefinitely for a response from FlowAPI. A keyboard interrupt (after ~23 hours) reveals that httpcore is still waiting for a response.
The flowmachine logs show that the server took 24 minutes to reply to the run message from FlowAPI - presumably either due to the large number of sub-query specs that needed to be deserialised, or the dependency graph construction required before storing the query.
So I think there are two issues to be resolved here:
- FlowMachine should not take a long time to respond to a run request. Further investigation is required to determine which part of the process is slow, and then that process should be pushed into the background so that the server can send a timely response to the user.
- If FlowMachine does take a long time to respond, flowclient should still receive that response. In this instance the server took 24 minutes to respond, but flowclient was still waiting for the response 23 hours later, so evidently the response did not get through to the client. Further investigation is required to establish whether the response went missing client-side (in which case the issue may be related to httpcore), or whether FlowAPI did not send a response (the flowmachine logs reveal that flowmachine sent a reply to flowapi, but I don't know whether flowapi passed that response back to the client).
On a separate but related point: although the labelled_spatial_aggregate included 5140 query specs, there were only 488 distinct query specs. We may be able to mitigate this issue by restructuring the query schemas to reduce the amount of duplication required in query specs.
Additional context FlowKit version: 1.17.0 flowclient version: 1.17.0 flowclient running in Jupyterlab I tried both the synchronous and asynchronous clients - the same bug occurred with both.
I'd be interested in whether #3128 helps with the first.