fscrawler icon indicating copy to clipboard operation
fscrawler copied to clipboard

Can't use loadbalancer URL of Elasticsearch cluster in FSCrawler

Open rkmohapatra opened this issue 6 years ago • 0 comments

Describe the bug Can't use the loadbalancer URL to access elasticsearch cluster from fscrawler using the elasticsearch nodes setting.

To Reproduce

Steps to reproduce the behavior:

Install nginx on kubernetes. Setup Elasticsearch cluster on Kubernetes (with default service type 'ClusterIP'). Setup ingress rules to access the elasticsearch cluster using nginx loadbalancer with the URL format like http://lb_ip_address/elasticsearch-svc. Configure the above URL as elasticsearch.nodes in the fscrawler _settings.json.

The crawler fails with an error 'Name or Service not known' as described in https://discuss.elastic.co/t/can-fscrawler-access-elasticsearch-cluster-behind-load-balancer/199591

Expected behavior

fscrawler should support such scenarios as well.

Versions:

  • OS: Linux
  • Version : 2.7-snapshot

Workaround: I setup elasticsearch cluster with a service type as 'NodePort' on Kubernetes, so that each cluster nodes are exposed outside of the private network. Then, I used both the nodes in fscrawler configuration. This configuration works fine.

"elasticsearch" : {
    "nodes" : [
       {"url" : "http://NODE1_IP:NODE1_PORT"},
       {"url" : "http://NODE2_IP:NODE2_PORT"}]

rkmohapatra avatar Sep 17 '19 16:09 rkmohapatra