vector icon indicating copy to clipboard operation
vector copied to clipboard

ElasticSearch Sink: Add support for multiple ElasticSearch hosts

Open lmanners opened this issue 3 years ago • 25 comments

Current Vector Version

vector 0.10.0

Use-cases

In order to use Vector with ElasticSearch in a high availability (HA) production environment, we would like to be able to specify multiple ElasticSearch hosts.

See example of how Filebeat supports this: https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#hosts-option

Attempted Solutions

Currently, it appears only possible to specify a single ElasticSearch host. With a single host, the ElasticSearch host being pointed at is a single point of failure.

Proposal

Add the ability to specify multiple ElasticSearch hosts, (see Filebeat for example).


Ref https://github.com/fluent/fluent-bit/issues/1340

lmanners avatar Aug 31 '20 14:08 lmanners

@lmanners Thanks for this. Agree we'd like to do this and I've added it to our product backlog.

jamtur01 avatar Aug 31 '20 14:08 jamtur01

We could decide on the approach to sending to specific servers: round robin, failover, etc. We should check what Filebeat does here.

jamtur01 avatar Nov 06 '20 20:11 jamtur01

This seems related to https://github.com/timberio/vector/issues/4156 . I imagine we'll end up solving this in a general way that could be applied to most sinks.

jszwedko avatar Nov 06 '20 21:11 jszwedko

I think you want to have a look at the sniffing mechanism most elastic components use: https://www.elastic.co/de/blog/elasticsearch-sniffing-best-practices-what-when-why-how

This is how filebeat, fluentd or logstash find all the nodes in a cluster and than use round robin to distribute the load.

hikhvar avatar Apr 26 '21 14:04 hikhvar

:+1: Also missing this feature

fpytloun avatar Aug 11 '21 09:08 fpytloun

The absence of this feature prevents the use of the vector in production.

Hertan avatar Aug 17 '21 11:08 Hertan

The absence of this feature prevents the use of the vector in production.

We are currently using load balancer and multiple elasticsearch hosts behind it. But it will not survive complete region downtime including downtime of that particular balancer.

fpytloun avatar Aug 17 '21 11:08 fpytloun

Is this still being considered? I see it keeps being removed from milestones so I'm just wondering.

rensenti avatar Dec 03 '21 08:12 rensenti

The milestones switching was due to some old automation we had with Zendesk. It hasn't been prioritized, but we are still considering it (as well as broader client-side load balancing in Vector). Adding a 👍 reaction to the issue helps us track demand.

jszwedko avatar Dec 03 '21 18:12 jszwedko

As @fpytloun mentioned, one possible workaround to this Vector limitation is to use load balancing. What he referred to sounds more like a reverse proxy, which becomes its own SPOF. But if you set up BGP anycasting in a creative way you can get around that. Still pretty cumbersome to have to do that when Elasticsearch is intended to be highly available without complicated multi-tier load balancing setups.

CameronNemo avatar Mar 21 '22 19:03 CameronNemo

Building on top of what has been said.

Design

Add a feature for distributing requests between multiple hosts/endpoints.

Following options would be added to elasticsearch sink config:

# Additional endpoints. Optional.
distribution.endpoints = ["...","...", ...]

# Delay between trying to reactivate inactive endpoints. Optional, default 5 sec.
distribution.reactivate_delay = 5

Implementation

The basic idea is to add a tower layer which would choose to which service/host/endpoint to send a request.

Adding this layer would make a step towards #4156 in a way that it's applicable to other http based sinks.

tower has a layer/middleware for this. It uses Power of Two Random Choices algorithm for spreading load across endpoints. This would also be extendable with sniffing feature. Also, use of this middleware would require providing load factor to it.

For this implementation a simple flat constant as a load factor would be used. This would make the algorithm distribute the load randomly but uniformly between active endpoints. This could then be extended to track in flight requests per endpoint and make that a load factor, or it could be further enhanced by reusing statistics from ARC, which would add more active load balancing to this sink.

The positioning of this layer determines interaction it has with rate limiter, retry, and ARC layer/feature, in a significant way.

For context, this is the tower stack:

  • Rate limit
  • ARC
  • Retry
  • Timeout
  • Service

Position above all

The layer would wrap the whole stack in the sink. This would mean:

  • Being above retry layer, it can't retry with a different endpoint by default.
  • Being above ARC, the layer can interpret pending services as being temporarily inactive hence ARC being an indirect source of load factor.
  • Being above rate limiter, a limit would be applied on endpoint basis.

This would be a local change to Elasticsearch sink, but much of it would be reusable for other http based sinks.

An extension of this is to add additional retry layer on top of this to retry between different endpoints. This would effectively add fail-over feature to this sink. This could be configured by adding retry options to distribution option set.

Also an extension of this is to add additional rate limiter on top of this to apply limiter on the sink as a whole. Which seems necessary since without it there is no way to constrain the sink. This could be configured by adding rate limit options to distribution option set.

It would be highly advised to turn on ARC for this. Maybe we could even turn it on by default.

This placing would also mean that every other option outside distribution option set applies per endpoint making it clearer what does what. Which would also mean that by turning distribution on other options don't need to be changed.

Alternative - Position bellow retry layer

The layer would be added immediately bellow retry layer. This would mean:

  • Being bellow retry layer, it can retry between different endpoints.
  • Being bellow ARC layer will allow ARC to scale up or down concurrency as the number of underlying active endpoints change.
  • Being bellow rate limiter, a limit will be applied on sink basis.

There are issues with this position:

  1. It's maybe optimistic that ARC, which was built with a single connection in mind, will be able to handle multiple ones disguised as a single one.
  2. Using ARC as a load factor or as an indicator of active/inactive state is not an option.
  3. This would be a broader change since it requires extending structures common to other sinks.
  4. The definition of options such as request.rate_limit_num would change depending on if distribution is on or off.

Alternative - Custom layer

The use of tower middleware would force us to use Power of Two Random Choices algorithm and its other design choices. An alternative would be to make custom layer similar to that middleware with a simpler algorithm like Round Robin.

cc. @jszwedko

ktff avatar May 20 '22 12:05 ktff

Thanks for this write-up, @ktff !

I think the ability for it to retry across Elasticsearch instances will be important (though maybe @lmanners or @sim0nx can weigh in, as users).

Given that I think second alternative makes sense to me. I think ARC would behave similarly to if it was just a single-load balanced endpoint so this is probably ok (i.e. if Vector was pointed at a load balancer that balanced across multiple ES instances).

With regards to balancing, using that middleware seems attractive to me. I'd be satisfied with round-robin if we do need to diverge for any reason, though.

I'd like to get @bruceg's thoughts here too. I'll ask him to take a look this week.

jszwedko avatar May 23 '22 19:05 jszwedko

Retry across Elasticsearch instances is doable in both cases. ~~In main case an additional retry layer can be added on top of this, while in alternative to reuse existing retry layer we would need to change responses/errors in this layer to signal retry layer that it's ok to retry.~~ EDIT: In both cases a new retry logic will be needed.

ktff avatar May 24 '22 12:05 ktff

I think the ability for it to retry across Elasticsearch instances will be important (though maybe @lmanners or @sim0nx can weigh in, as users).

Yes this is definitely important.

sim0nx avatar May 24 '22 14:05 sim0nx

I think ARC would behave similarly to if it was just a single-load balanced endpoint so this is probably ok (i.e. if Vector was pointed at a load balancer that balanced across multiple ES instances).

That's a good point. In that case I also think alternative makes more sense.

ktff avatar May 24 '22 15:05 ktff

@bruceg it would be good to get your thoughts on this.

ktff avatar Jun 01 '22 13:06 ktff

I think this situation demonstrates a problem with the layering model as it stands:

  • We want to be able to retry on different hosts, so the distribution layer needs to go below retry.
  • We want the distribution layer to have concurrency managed by ARC, so the ARC layer needs to go below distribution.
  • We want retries to increase the effective RTT and back-off concurrency, so the retry layer needs to go below ARC.

Obviously, we can't do all three at the same time.

I think I agree with @jszwedko's analysis in general. As a layer this would fit best below retry. ARC is intended to manage concurrency in general, and does considerable averaging of RTTs before using its statistics. If the collection of ES instances have at least similar RTT characteristics, ARC will be able to manage them as though they were once instance. Tweaking the AWMA parameters may help bring more chaotic distributions into coherence as well. If the distribution layer needs RTT statistics, we could possibly split out a shared statistics gathering layer that could be used by both.

bruceg avatar Jun 06 '22 23:06 bruceg

If the collection of ES instances have at least similar RTT characteristics, ARC will be able to manage them as though they were once instance.

That sounds good enough. I assume that would workout to something like:

  • If all ES instances are in the same data center, ARC should work.
  • If all ES instances are in the same region, ARC will probably work, some tweaking of AWMA parameters may be needed.
  • Else, ARC probably won't work without heavy tweaking and even then may not work.

Considering that, we may want to make ARC opt in when distribution is enabled, or to avoid the extra magic some visible mention of this in distribution documentation could be added.

ktff avatar Jun 07 '22 08:06 ktff

Related to splitting out statistics gathering layer, maybe if split in a right way it could be put bellow distribution layer which would also be a step in solving this circular dependency of layers. I'll see during implementation if something like that is doable.

ktff avatar Jun 07 '22 08:06 ktff

Related to splitting out statistics gathering layer, maybe if split in a right way it could be put bellow distribution layer which would also be a step in solving this circular dependency of layers. I'll see during implementation if something like that is doable.

That is exactly what I was envisioning, yes.

bruceg avatar Jun 07 '22 18:06 bruceg

Related to ARC segment in https://github.com/vectordotdev/vector/pull/13236#pullrequestreview-1055604956 it kinda became a requirement to put ARC bellow distribution so that each endpoint is managed by its own ARC. To do it, dependency circle needs to be broken, unfortunately there are two features that are practically mutually exclusive under such design:

  • Failover
  • ARC sensitivity to retries
    • We want retries to increase the effective RTT and back-off concurrency

They conflict in a way that for ARC to be sensitive to retries they need to happen with the same endpoint, while failover wants for retries to happen with different endpoints.

Between the two, ARC sensitivity to retries seems less important and could maybe me managed in some other way in ARC so I'm for prioritizing failover. That would leave us with two designs:

  1. Always failover on retries.
  2. Retry some amount of times on the same endpoint, if that fails then failover.

cc. @bruceg @tobz

ktff avatar Aug 02 '22 14:08 ktff

Hello, guys. Is there any plans to add multiple sinks feature to Vector? It's blocker for our production use: LB will be a SPOF, and it'd be nice if Vector can balance record by himself.

sc0rp10 avatar Sep 12 '22 12:09 sc0rp10

Hello, guys. Is there any plans to add multiple sinks feature to Vector? It's blocker for our production use: LB will be a SPOF, and it'd be nice if Vector can balance record by himself.

There is an open PR for this: https://github.com/vectordotdev/vector/pull/14088 . The load balancing will be rudimentary, but will hopefully suffice for simple use-cases.

Regarding SPOF for an LB, it is common to use DNS to provide redundancy by having N load balancers behind the same A record. We generally recommend this over having Vector do the load balancing since load balancers will always have advanced load balancing features.

jszwedko avatar Sep 12 '22 14:09 jszwedko

@jszwedko ah, got it. Can you please point me out how Vector works with DNS LB? Does it respect DNS TTLs? For example, can I use something like Consul DNS names with low TTL as a sink address? Will Vector resolve DNS name at each request to sink?

sc0rp10 avatar Sep 14 '22 11:09 sc0rp10

@jszwedko ah, got it. Can you please point me out how Vector works with DNS LB? Does it respect DNS TTLs? For example, can I use something like Consul DNS names with low TTL as a sink address? Will Vector resolve DNS name at each request to sink?

From what I can see, the crate we use for HTTP requests, hyper, actually re-resolves DNS for each new HTTP request via getaddrinfo. If you enable debug hyper logging (VECTOR_LOG=hyper=debug) you should see debug messages for when it re-resolves. The OS may do some caching (e.g. systemd-resolved) based on TTLs.

jszwedko avatar Sep 16 '22 21:09 jszwedko