Slowness in query submission and response.
While running the databricks query, the query submission and response is taking a lot of time to return back although actual runtime in the query history is within a second. But, the python gets data back after 3-5 minutes.
I'm running a sequence of 5 queries which opens and closes session after each one and it's ultimately delaying up to 15 minutes which in actuality shall complete in 5-10 seconds. The workspace is AWS private linked one.
the query submission and response is taking a lot of time
opens and closes session after each one
Could you share your code? Is the delay occurring at the connect or execute stage?
def execute_databricks_query(query):
"""
Function to execute a query inside Databricks and returns back the result.
Expects following environmental variables to be defined first,
DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN
"""
with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
with connection.cursor() as cursor:
cursor.execute(query)
# cursor.execute("SELECT * FROM lineage_catalog_qa.admin.meta_control_table")
result = cursor.fetchall()
return result
I'm using the above function.
The slowness is happening at connect stage, the execution is fast and then the response back is taking time again.
Note: Our environment is a AWS Privatelinked one and network flow is within VPC to Databricks. This issue is from past 1-2 weeks.
What version of Python are you using? I opened #422 the other day regarding slow connection and found it could be solved by using a different Python version.
Either way, if you run the function in debug mode, have you traced to call stack to see where the delay is occurring?
I was trying with python 3.7 and later moved to python3.8 as I see python3.8 supports the latest sql-connector version 3.3. Moving from python3.7 to python3.8 reduced my query time to half.
And, I see first 2 queries out of 5 mostly identical are running quick and then the delay starts.
I have tried the debug too, will try to share the debug logs here, if it can helps.
22:06:12 DEBUG :: retry parameter: _retry_delay_min given_or_default 1.0
22:06:12 DEBUG :: retry parameter: _retry_delay_max given_or_default 60.0
22:06:12 DEBUG :: retry parameter: _retry_stop_after_attempts_count given_or_default 30
22:06:12 DEBUG :: retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
22:06:12 DEBUG :: retry parameter: _retry_delay_default given_or_default 5.0
22:06:12 DEBUG :: Sending request: OpenSession(<REDACTED>)
22:06:12 DEBUG :: Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443
22:06:12 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:06:12 DEBUG :: Received response: TOpenSessionResp(<REDACTED>)
22:06:12 INFO :: Successfully opened session 01ef55a4-52bc-1323-8ab1-f0b3564ad9e9
22:06:12 DEBUG :: Sending request: ExecuteStatement(<REDACTED>)
22:06:13 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:06:13 DEBUG :: Received response: TExecuteStatementResp(<REDACTED>)
22:06:13 INFO :: Closing session 01ef55a4-52bc-1323-8ab1-f0b3564ad9e9
22:06:13 DEBUG :: Sending request: CloseSession(<REDACTED>)
22:06:14 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:06:14 DEBUG :: Received response: TCloseSessionResp(<REDACTED>)
22:06:14 DEBUG :: retry parameter: _retry_delay_min given_or_default 1.0
22:06:14 DEBUG :: retry parameter: _retry_delay_max given_or_default 60.0
22:06:14 DEBUG :: retry parameter: _retry_stop_after_attempts_count given_or_default 30
22:06:14 DEBUG :: retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
22:06:14 DEBUG :: retry parameter: _retry_delay_default given_or_default 5.0
22:06:14 DEBUG :: Sending request: OpenSession(<REDACTED>)
22:06:14 DEBUG :: Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443
22:08:35 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:08:35 DEBUG :: Received response: TOpenSessionResp(<REDACTED>)
22:08:35 INFO :: Successfully opened session 01ef55a4-a14e-15ab-b620-038a35e8489e
22:08:35 DEBUG :: Sending request: ExecuteStatement(<REDACTED>)
22:08:35 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:08:35 DEBUG :: Received response: TExecuteStatementResp(<REDACTED>)
22:08:35 INFO :: Closing session 01ef55a4-a14e-15ab-b620-038a35e8489e
22:08:35 DEBUG :: Sending request: CloseSession(<REDACTED>)
22:08:35 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:08:35 DEBUG :: Received response: TCloseSessionResp(<REDACTED>)
22:08:35 DEBUG :: retry parameter: _retry_delay_min given_or_default 1.0
22:08:35 DEBUG :: retry parameter: _retry_delay_max given_or_default 60.0
22:08:35 DEBUG :: retry parameter: _retry_stop_after_attempts_count given_or_default 30
22:08:35 DEBUG :: retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
22:08:35 DEBUG :: retry parameter: _retry_delay_default given_or_default 5.0
22:08:35 DEBUG :: Sending request: OpenSession(<REDACTED>)
22:08:35 DEBUG :: Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443
22:08:35 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:08:35 DEBUG :: Received response: TOpenSessionResp(<REDACTED>)
22:08:35 INFO :: Successfully opened session 01ef55a4-a1b8-15ca-84e0-0776f5bfcdc7
22:08:35 DEBUG :: Sending request: ExecuteStatement(<REDACTED>)
22:08:35 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:08:35 DEBUG :: Received response: TExecuteStatementResp(<REDACTED>)
22:08:35 INFO :: Closing session 01ef55a4-a1b8-15ca-84e0-0776f5bfcdc7
22:08:35 DEBUG :: Sending request: CloseSession(<REDACTED>)
22:08:35 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:08:35 DEBUG :: Received response: TCloseSessionResp(<REDACTED>)
22:08:35 DEBUG :: retry parameter: _retry_delay_min given_or_default 1.0
22:08:35 DEBUG :: retry parameter: _retry_delay_max given_or_default 60.0
22:08:35 DEBUG :: retry parameter: _retry_stop_after_attempts_count given_or_default 30
22:08:35 DEBUG :: retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
22:08:35 DEBUG :: retry parameter: _retry_delay_default given_or_default 5.0
22:08:35 DEBUG :: Sending request: OpenSession(<REDACTED>)
22:08:35 DEBUG :: Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443
22:10:41 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:10:41 DEBUG :: Received response: TOpenSessionResp(<REDACTED>)
22:10:41 INFO :: Successfully opened session 01ef55a4-ef7f-1a9b-b4d9-0ca1310959f7
22:10:41 DEBUG :: Sending request: ExecuteStatement(<REDACTED>)
22:10:41 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:10:41 DEBUG :: Received response: TExecuteStatementResp(<REDACTED>)
22:10:41 INFO :: Closing session 01ef55a4-ef7f-1a9b-b4d9-0ca1310959f7
22:10:41 DEBUG :: Sending request: CloseSession(<REDACTED>)
22:10:41 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:10:41 DEBUG :: Received response: TCloseSessionResp(<REDACTED>)
22:10:41 DEBUG :: retry parameter: _retry_delay_min given_or_default 1.0
22:10:41 DEBUG :: retry parameter: _retry_delay_max given_or_default 60.0
22:10:41 DEBUG :: retry parameter: _retry_stop_after_attempts_count given_or_default 30
22:10:41 DEBUG :: retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
22:10:41 DEBUG :: retry parameter: _retry_delay_default given_or_default 5.0
22:10:41 DEBUG :: Sending request: OpenSession(<REDACTED>)
22:10:41 DEBUG :: Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443
22:10:41 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:10:41 DEBUG :: Received response: TOpenSessionResp(<REDACTED>)
22:10:41 INFO :: Successfully opened session 01ef55a4-f07c-1e49-a0ad-ace3dacf8680
22:10:41 DEBUG :: Sending request: ExecuteStatement(<REDACTED>)
22:10:41 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:10:41 DEBUG :: Received response: TExecuteStatementResp(<REDACTED>)
22:10:41 INFO :: Closing session 01ef55a4-f07c-1e49-a0ad-ace3dacf8680
22:10:41 DEBUG :: Sending request: CloseSession(<REDACTED>)
22:10:41 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:10:41 DEBUG :: Received response: TCloseSessionResp(<REDACTED>)
22:10:41 DEBUG :: retry parameter: _retry_delay_min given_or_default 1.0
22:10:41 DEBUG :: retry parameter: _retry_delay_max given_or_default 60.0
22:10:41 DEBUG :: retry parameter: _retry_stop_after_attempts_count given_or_default 30
22:10:41 DEBUG :: retry parameter: _retry_stop_after_attempts_duration given_or_default 900.0
22:10:41 DEBUG :: retry parameter: _retry_delay_default given_or_default 5.0
22:10:41 DEBUG :: Sending request: OpenSession(<REDACTED>)
22:10:41 DEBUG :: Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443
22:13:02 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:13:02 DEBUG :: Received response: TOpenSessionResp(<REDACTED>)
22:13:02 INFO :: Successfully opened session 01ef55a5-3ed9-12ed-8b0b-f8c0a3f00c82
22:13:02 DEBUG :: Sending request: ExecuteStatement(<REDACTED>)
22:13:02 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
22:13:02 DEBUG :: Received response: TExecuteStatementResp(<REDACTED>)
22:13:02 INFO :: Closing session 01ef55a5-3ed9-12ed-8b0b-f8c0a3f00c82
22:13:02 DEBUG :: Sending request: CloseSession(<REDACTED>)
22:13:02 DEBUG :: https://my-org-redacted-dev.cloud.databricks.com:443 "POST /sql/1.0/warehouses/aebf2c8fc913084f HTTP/1.1" 200 None
@jaames-bentley -> here is the debug version of it. I see a few are quick and other calls has a delay of at least 2-3 minutes.
I would also give a try with latest version of python 3.10+ to verify the same.
@shahrukh-shaik thank you for the log. It's actually interesting. I noticed "Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443" lines. They actually come from the urllib3. First time, when OpenSession command is issued by sql.connect(), the response is almost instant. Then, I think, the same connection is re-used for other Thrift commands. But every subsequent attempt takes tens of seconds. I'm wondering if that's something with urllib3, or the way we use it? 🤔
@shahrukh-shaik thank you for the log. It's actually interesting. I noticed "Starting new HTTPS connection (1): my-org-redacted-dev.cloud.databricks.com:443" lines. They actually come from the
urllib3. First time, whenOpenSessioncommand is issued bysql.connect(), the response is almost instant. Then, I think, the same connection is re-used for other Thrift commands. But every subsequent attempt takes tens of seconds. I'm wondering if that's something with urllib3, or the way we use it? 🤔
Not sure, if you can see the above function for query execution, I'm using with clause which creates a new session every time. Is this something you are talking about?
PS: i have tried python3.10 with both urllib3<2 >=2 versions and haven't seen any difference.
Is this something you are talking about?
Yes, exactly
@kravets-levko
dbx_connection=sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN"))
def execute_databricks_query(query,connection=dbx_connection):
"""
Function to execute a query inside Databricks and returns back the result.
Expects following environmental variables to be defined first,
DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN
"""
# commenting above code as there is a issue with slowness https://github.com/databricks/databricks-sql-python/issues/421
# with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
# http_path = os.getenv("DATABRICKS_HTTP_PATH"),
# access_token = os.getenv("DATABRICKS_TOKEN")) as connection:
with dbx_connection.cursor() as cursor:
cursor.execute(query)
# cursor.execute("SELECT * FROM lineage_catalog_qa.admin.meta_control_table")
result = cursor.fetchall()
return result
I have made the code change to open the connection only once and closing this at the end of the script using
if dbx_connection.open:
dbx_connection.close()
Although I tried without closing it as I see in the source code, we have __del__ method, but it was throwing some error as below
DEBUG :: Closing unclosed connection for session 01ef5634-50a9-118c-bebe-4c63fd8cc823
--- Logging error ---
Traceback (most recent call last):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 69, in emit
if self.shouldRollover(record):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 183, in shouldRollover
self.stream = self._open()
File "/opt/python/3.8.18/lib/python3.8/logging/__init__.py", line 1176, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
NameError: name 'open' is not defined
Call stack:
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 283, in __del__
logger.debug(
Message: 'Closing unclosed connection for session 01ef5634-50a9-118c-bebe-4c63fd8cc823'
Arguments: ()
INFO :: Closing session 01ef5634-50a9-118c-bebe-4c63fd8cc823
--- Logging error ---
Traceback (most recent call last):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 69, in emit
if self.shouldRollover(record):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 183, in shouldRollover
self.stream = self._open()
File "/opt/python/3.8.18/lib/python3.8/logging/__init__.py", line 1176, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
NameError: name 'open' is not defined
Call stack:
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 288, in __del__
self._close(close_cursors=False)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 354, in _close
logger.info(f"Closing session {self.get_session_id_hex()}")
Message: 'Closing session 01ef5634-50a9-118c-bebe-4c63fd8cc823'
Arguments: ()
DEBUG :: Sending request: CloseSession(<REDACTED>)
--- Logging error ---
Traceback (most recent call last):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 69, in emit
if self.shouldRollover(record):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 183, in shouldRollover
self.stream = self._open()
File "/opt/python/3.8.18/lib/python3.8/logging/__init__.py", line 1176, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
NameError: name 'open' is not defined
Call stack:
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 288, in __del__
self._close(close_cursors=False)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 359, in _close
self.thrift_backend.close_session(self._session_handle)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/thrift_backend.py", line 591, in close_session
self.make_request(self._client.CloseSession, req)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/thrift_backend.py", line 497, in make_request
response_or_error_info = attempt_request(attempt)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/thrift_backend.py", line 398, in attempt_request
logger.debug("Sending request: {}(<REDACTED>)".format(this_method_name))
Message: 'Sending request: CloseSession(<REDACTED>)'
Arguments: ()
INFO :: Error during request to server: {"method": "CloseSession", "session-id": "b'\\x01\\xefV4P\\xa9\\x11\\x8c\\xbe\\xbeLc\\xfd\\x8c\\xc8#'", "query-id": null, "http-code": 200, "error-message": "", "original-exception": "sys.meta_path is None, Python is likely shutting down", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "0.0009160041809082031/900.0"}
--- Logging error ---
Traceback (most recent call last):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 69, in emit
if self.shouldRollover(record):
File "/opt/python/3.8.18/lib/python3.8/logging/handlers.py", line 183, in shouldRollover
self.stream = self._open()
File "/opt/python/3.8.18/lib/python3.8/logging/__init__.py", line 1176, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding)
NameError: name 'open' is not defined
Call stack:
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 288, in __del__
self._close(close_cursors=False)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/client.py", line 359, in _close
self.thrift_backend.close_session(self._session_handle)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/thrift_backend.py", line 591, in close_session
self.make_request(self._client.CloseSession, req)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/thrift_backend.py", line 509, in make_request
self._handle_request_error(error_info, attempt, elapsed)
File "/var/lib/jenkins/.local/lib/python3.8/site-packages/databricks/sql/thrift_backend.py", line 337, in _handle_request_error
logger.info(network_request_error.message_with_context())
Message: 'Error during request to server: {"method": "CloseSession", "session-id": "b\'\\\\x01\\\\xefV4P\\\\xa9\\\\x11\\\\x8c\\\\xbe\\\\xbeLc\\\\xfd\\\\x8c\\\\xc8#\'", "query-id": null, "http-code": 200, "error-message": "", "original-exception": "sys.meta_path is None, Python is likely shutting down", "no-retry-reason": "non-retryable error", "bounded-retry-delay": null, "attempt": "1/30", "elapsed-seconds": "0.0009160041809082031/900.0"}'
Arguments: ()
https://stackoverflow.com/questions/64679139/nameerror-name-open-is-not-defined-when-trying-to-log-to-files