pyapacheatlas icon indicating copy to clipboard operation
pyapacheatlas copied to clipboard

`search_entities` API might be throttled

Open xiaoyongzhu opened this issue 2 years ago • 4 comments

Describe the bug When there's a large amount of purview entities (currently we have > 20K), the search_entities API might have this error:

[1:54 PM] Xiaoyong Zhu

Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 700, in urlopen
self._prepare_proxy(conn)
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 994, in _prepare_proxy
conn.connect()
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connection.py", line 414, in connect
self.sock = ssl_wrap_socket(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/opt/miniconda3/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/opt/miniconda3/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/opt/miniconda3/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)


During handling of the above exception, another exception occurred:


Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/opt/miniconda3/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='feathrazuretest3-purview1.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

Seems in the code, the search_entities API keeps calling the query API, and looks like purview has some enforced throttling in the backend which results this error.

xiaoyongzhu avatar Jun 11 '22 22:06 xiaoyongzhu

We have been facing the same issue. We are dealing with similar amount of entities i.e. entities > 20k

amiket23 avatar Jun 23 '22 09:06 amiket23

Could you please supply your method parameters. I believe the discovery endpoint has a limit of 10k per page so it's normal to expect paging in the API.

microcassidy avatar Jun 24 '22 02:06 microcassidy

These are my supply parameters "("*", search_filter=filter_setup, limit=1000, starting_offset=0)". I noticed that irrespective of what supply parameters are given the generator object actually returns all the values so I have been turning the generator object directly to a list and getting all 18K entities at once. Here is the error message I received:

Traceback (most recent call last): File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen httplib_response = self._make_request( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 1040, in validate_conn conn.connect() File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connection.py", line 414, in connect self.sock = ssl_wrap_socket( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\ssl.py", line 449, in ssl_wrap_socket ssl_sock = ssl_wrap_socket_impl( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\ssl.py", line 493, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 501, in wrap_socket return self.sslsocket_class._create( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 1041, in _create self.do_handshake() File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\ssl.py", line 1310, in do_handshake self._sslobj.do_handshake() ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\connectionpool.py", line 785, in urlopen retries = retries.increment( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='datapurviewprod.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 437, in upload_entities(inputfile_path, environment_type, subfolder) File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 304, in upload_entities field_typename_df = build_df(field_typename) File "C:\Users\u724909\PycharmProjects\Data-Activity-Monitoring\upload_dictionary.py", line 61, in build_df result = list(search) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\pyapacheatlas\core\discovery\purview.py", line 233, in _search_generator results = self.query( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\pyapacheatlas\core\discovery\purview.py", line 163, in query postResult = requests.post( File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\api.py", line 115, in post return request("post", url, data=data, json=json, **kwargs) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "C:\Users\u724909\PycharmProjects\pythonProject\venv\lib\site-packages\requests\adapters.py", line 563, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='datapurviewprod.purview.azure.com', port=443): Max retries exceeded with url: /catalog/api/search/query?api-version=2021-05-01-preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

amiket23 avatar Jul 05 '22 13:07 amiket23

.disovery.search_entities returns a generator that paginates. The limit you are specifying is the HTTP pagesize, not the number of returned results from exhausting the iterator.

I was unable to replicate your error and I made >20K individual requests to an endpoint. Are you going through a proxy? A quick google of your error brought up a related issue with the az cli. Perhaps look into your proxy config and library versions of the dependencies.

preview (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

#https://github.com/Azure/azure-cli/issues/19456 #

microcassidy avatar Jul 06 '22 22:07 microcassidy