databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

[PECO-1263] Implement a .returned_as_direct_result property for AsyncExecution status

Open susodapop opened this issue 1 year ago • 3 comments

Description

This PR implements an additional boolean flag on AsyncExecution that indicates whether this execution returned directResults. When this happens, results cannot be fetched from the server again. Because when a directResults query returns its operation is closed immediately.

I determined this is necessary while documenting the new execute_async() behaviour. For context, when we send a TExecuteStatementReq to the thrift server, there are two possible return conditions:

  1. If the query completes within five seconds, the resulting TExecuteStatementResp will actually include the results!
  2. If the query takes longer than five seconds to complete, the resulting TExecuteStatementResp will only include the query_id for the query, which we can use to poll for the results until they are ready.

In case 1, the result is sent within the initial TExecuteStatementResp and they cannot be fetched a second time. What this means for users is that if they are going to call .serialize() to persist a query_id and secret for use by another thread, they need a way to verify that the results will actually be available to another thread.

This new .returned_as_direct_result property makes that easier.

One unfortunate implication of this behaviour is that users cannot use one thread to kickoff all their queries and a separate thread to fetch their results. Because not all AsyncExecution objects can actually be picked up separately.

susodapop avatar Jan 22 '24 20:01 susodapop

If the query completes within five seconds, the resulting TExecuteStatementResp will actually include the results! If the query takes longer than five seconds to complete, the resulting TExecuteStatementResp will only include the query_id for the query, which we can use to poll for the results until they are ready.

Oh man, why? This is really unfortunate behavior.

benc-db avatar Jan 22 '24 20:01 benc-db

is really unfortunate behavior.

will this cause any issues?

jadewang-db avatar Jan 22 '24 22:01 jadewang-db

is really unfortunate behavior.

will this cause any issues?

Jade, this is the issue:

In case 1, the result is sent within the initial TExecuteStatementResp and they cannot be fetched a second time. What this means for users is that if they are going to call .serialize() to persist a query_id and secret for use by another thread, they need a way to verify that the results will actually be available to another thread.

benc-db avatar Jan 22 '24 22:01 benc-db