Can't use loadbalancer URL of Elasticsearch cluster in FSCrawler
Describe the bug Can't use the loadbalancer URL to access elasticsearch cluster from fscrawler using the elasticsearch nodes setting.
To Reproduce
Steps to reproduce the behavior:
Install nginx on kubernetes. Setup Elasticsearch cluster on Kubernetes (with default service type 'ClusterIP'). Setup ingress rules to access the elasticsearch cluster using nginx loadbalancer with the URL format like http://lb_ip_address/elasticsearch-svc. Configure the above URL as elasticsearch.nodes in the fscrawler _settings.json.
The crawler fails with an error 'Name or Service not known' as described in https://discuss.elastic.co/t/can-fscrawler-access-elasticsearch-cluster-behind-load-balancer/199591
Expected behavior
fscrawler should support such scenarios as well.
Versions:
- OS: Linux
- Version : 2.7-snapshot
Workaround: I setup elasticsearch cluster with a service type as 'NodePort' on Kubernetes, so that each cluster nodes are exposed outside of the private network. Then, I used both the nodes in fscrawler configuration. This configuration works fine.
"elasticsearch" : {
"nodes" : [
{"url" : "http://NODE1_IP:NODE1_PORT"},
{"url" : "http://NODE2_IP:NODE2_PORT"}]