pyapacheatlas
pyapacheatlas copied to clipboard
`search_entities` API might be throttled
Describe the bug
When there's a large amount of purview entities (currently we have > 20K), the search_entities
API might have this error:
[1:54 PM] Xiaoyong Zhu
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 700, in urlopen
self._prepare_proxy(conn)
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 994, in _prepare_proxy
conn.connect()
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/opt/miniconda3/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/opt/miniconda3/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/opt/miniconda3/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='feathrazuretest3-purview1.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
Seems in the code, the search_entities
API keeps calling the query API, and looks like purview has some enforced throttling in the backend which results this error.
We have been facing the same issue. We are dealing with similar amount of entities i.e. entities > 20k
Could you please supply your method parameters. I believe the discovery endpoint has a limit of 10k per page so it's normal to expect paging in the API.
These are my supply parameters "("*", search_filter=filter_setup, limit=1000, starting_offset=0)". I noticed that irrespective of what supply parameters are given the generator object actually returns all the values so I have been turning the generator object directly to a list and getting all 18K entities at once. Here is the error message I received:
Traceback (most recent call last): File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 1040, in validate_conn conn.connect() File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connection.py", line 414, in connect self.sock = ssl_wrap_socket( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\ssl.py", line 449, in ssl_wrap_socket ssl_sock = ssl_wrap_socket_impl( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\ssl.py", line 493, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 501, in wrap_socket return self.sslsocket_class._create( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 1041, in _create self.do_handshake() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 1310, in do_handshake self._sslobj.do_handshake() ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 785, in urlopen retries = retries.increment( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='datapurviewprod.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 437, in
.disovery.search_entities
returns a generator that paginates. The limit you are specifying is the HTTP pagesize, not the number of returned results from exhausting the iterator.
I was unable to replicate your error and I made >20K individual requests to an endpoint. Are you going through a proxy? A quick google of your error brought up a related issue with the az cli. Perhaps look into your proxy config and library versions of the dependencies.
preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))
#https://github.com/Azure/azure-cli/issues/19456 #