dask-elk icon indicating copy to clipboard operation
dask-elk copied to clipboard

Failed connect to elasticsearch manage by aws behind VPN

Open ibnubay opened this issue 4 years ago • 17 comments

Failed connect to elasticsearch manage by aws behind VPN

client = DaskElasticClient(host='es.amazonaws.com', port=9200, scheme="https") my_index="my_index" df = client.read(index=my_index)

ibnubay avatar Jan 02 '21 14:01 ibnubay

Could you provide me with some more information e.g some log messages or exception messages?

avlahop avatar Jan 02 '21 15:01 avlahop

Traceback (most recent call last): File "", line 1, in File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/client.py", line 110, in read node_registry.get_nodes_from_elastic(elk_client) File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/node.py", line 43, in get_nodes_from_elastic resp = node_client.info() File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped return func(*args, params=params, headers=headers, **kwargs) File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/elasticsearch/client/nodes.py", line 65, in info "GET", _make_path("_nodes", node_id, metric), params=params, headers=headers File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/elasticsearch/transport.py", line 390, in perform_request raise e File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/elasticsearch/transport.py", line 365, in perform_request timeout=timeout, File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 258, in perform_request raise ConnectionError("N/A", str(e), e) elasticsearch.exceptions.ConnectionError: ConnectionError((<urllib3.connection.HTTPSConnection object at 0x7fb394c20198>, 'Connection to es.amazonaws.com timed out. (connect timeout=10)')) caused by: ConnectTimeoutError((<urllib3.connection.HTTPSConnection object at 0x7fb394c20198>, 'Connection to es.amazonaws.com timed out. (connect timeout=10)'))

====================================================== Then I change the port to connect, that I use using package elasticsearch client = DaskElasticClient(host=['es.amazonaws.com'], scheme="https", port=443)

I get this error: Traceback (most recent call last): File "", line 1, in File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/client.py", line 110, in read node_registry.get_nodes_from_elastic(elk_client) File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/node.py", line 45, in get_nodes_from_elastic publish_address = node_info['http']['publish_address'] KeyError: 'http'

ibnubay avatar Jan 02 '21 15:01 ibnubay

It works fine when I do connection using elasticsearch package

from elasticsearch import Elasticsearch es_client = Elasticsearch(['https://es.amazonaws.com']) es_client <Elasticsearch([{'host': 'es.amazonaws.com', 'port': 443, 'use_ssl': True}])> es_client.indices.exists(my_index) True

ibnubay avatar Jan 02 '21 15:01 ibnubay

Please can you try creating a client using wan_only=True. The client tries to connect to each node in order to fetch data in parallel. To do that the data nodes need to be accessible from outside ELK cluster. This isn't always the case with cases like Amazon. See here for more info

avlahop avatar Jan 02 '21 16:01 avlahop

Still with same like last error message

client1 = DaskElasticClient(host=['es.amazonaws.com'], scheme="https", port=443, wan_only=True) df = client1.read(index=my_index) Traceback (most recent call last): File "", line 1, in File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/client.py", line 110, in read node_registry.get_nodes_from_elastic(elk_client) File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/node.py", line 45, in get_nodes_from_elastic publish_address = node_info['http']['publish_address'] KeyError: 'http'

========== Option wan_only, I think it not making difference/not affected to difference result in your code on earlier run read function, if I set True or False CMIIW. https://github.com/avlahop/dask-elk/blob/8a958a8a20e44c487f2a1c9b66a0710603b0295e/dask_elk/elk_entities/node.py#L36

========= Except in here https://github.com/avlahop/dask-elk/blob/8a958a8a20e44c487f2a1c9b66a0710603b0295e/dask_elk/client.py#L130

Notes: I thing my elasticsearch cluster version is 7.1.1 (may be help)

ibnubay avatar Jan 02 '21 16:01 ibnubay

Hello @ibnbay99 I openned a PR (#27). Can you try it with your setup and update whether it fixes your issue?

avlahop avatar Jan 02 '21 16:01 avlahop

Ty for fast fix, but I thing this lead into another error :pray:

df = client1.read(index=my_index) Traceback (most recent call last): File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/index.py", line 127, in __get_mappings mapping = mappings["mappings"][doc_type]["properties"] KeyError: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/client.py", line 128, in read elk_client, index=index, doc_type=doc_type File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/index.py", line 108, in get_indices_from_elasticsearch backward_compatibility=es_version >= self.__MINIMAL_6_x_SUPPORTED_VERSION, File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/index.py", line 134, in __get_mappings raise IndexNotFoundException dask_elk.elk_entities.index.IndexNotFoundException

================ When I fill doc_type df = client1.read(index=my_index, doc_type='_doc')

Traceback (most recent call last): File "", line 1, in File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/client.py", line 128, in read elk_client, index=index, doc_type=doc_type File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/index.py", line 111, in get_indices_from_elasticsearch self.__get_shards_with_nodes(elk_client, index=index) File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/index.py", line 171, in __get_shards_with_nodes node = self.__nodes_registry.get_node_by_id(node_id) File "/Users/dmg/opt/miniconda3/envs/es/lib/python3.6/site-packages/dask_elk/elk_entities/node.py", line 67, in get_node_by_id return self.__nodes[node_id] KeyError: 'z-OoRtyRQ6iQ3gKCNA'

ibnubay avatar Jan 02 '21 17:01 ibnubay

What is the version of your Elasticsearch?

avlahop avatar Jan 02 '21 17:01 avlahop

My aws elastic cluster version is 7.1.1

ibnubay avatar Jan 02 '21 18:01 ibnubay

Hi @avlahop , I try to continue your work and I think it works

I move wan_only option from here https://github.com/avlahop/dask-elk/blob/50841e63ac96d899eb36c9e25ede1061105e6c7f/dask_elk/client.py#L123

Get nodes info first node_registry = NodeRegistry() node_registry.get_nodes_from_elastic(elk_client, self.wan_only)

======= Into here https://github.com/avlahop/dask-elk/blob/8a958a8a20e44c487f2a1c9b66a0710603b0295e/dask_elk/elk_entities/node.py#L45 To be like this def get_nodes_from_elastic(self, elk_client, wan_only): publish_address = node_info["http"]["publish_address"] if not wan_only else None

========================= And create a simple run app, and works. with some adjustment on hosts while initialize DaskElasticClient

  • hosts must not a list of string, when option wan_only is True image

ibnubay avatar Jan 02 '21 18:01 ibnubay

I hope this PR doesn't breaking the run of the other state if wan_only option is False :crossed_fingers:

ibnubay avatar Jan 02 '21 18:01 ibnubay

The problem is that "DaskElasticsearchClient still tries to get the shards as they are distributed upon the different data nodes Do you want to open a new PR that merges in to the upstream master?

avlahop avatar Jan 02 '21 18:01 avlahop

For faster merge, I thing we can continue using your branch, and you can continue change 3 line of code (can see compare in my fork) using this https://github.com/avlahop/dask-elk/compare/bug/26/no_node_lookup_when_wan_only...ibnbay99:bug/26/no_node_lookup_when_wan_only

What do you thing @avlahop ?

ibnubay avatar Jan 02 '21 18:01 ibnubay

I opened #28 would you like to assign it to your self?I don't know if you have rights, but you can try

avlahop avatar Jan 02 '21 18:01 avlahop

@ibnbay99 have you tried #28 from your own branch, is everything working?

avlahop avatar Jan 02 '21 19:01 avlahop

Yes it work, like last image I posted. I confuse to write a test for option wan_only is true. I just added doc for part wan_only you comment on git commit. Because it run on function read right?

ibnubay avatar Jan 02 '21 20:01 ibnubay