beats icon indicating copy to clipboard operation
beats copied to clipboard

Publish() should backoff if Elasticsearch returns 429 HTTP rate limiting responses

Open rodrigc opened this issue 2 years ago • 3 comments

Describe the enhancement:

In some cases of high load, Elasticsearch will return 429 errors to indicate rate limiting. Beats should back off if it detects a HTTP 429 response.

Looking at the code in here, it looks like it does not do that and just re-sends: https://github.com/elastic/beats/blob/main/libbeat/outputs/elasticsearch/client.go#L187

Describe a specific use case for the enhancement or feature:

I have an Elasticsearch cluster, and in my network I have deployed 1500 elastic-agents. They are sending lots of logs to Elasticsearch, and I am routinely getting HTTP 429 errors.

On one hand, I am trying to scale the resources on the Elasticsearch server side.

However, it would be good if beats and elastic-agent could backoff if it detects HTTP 429 errors. Right now beats and elastic-agent seem to keep hammering on Elasticsearch if it returns HTTP 429 errors.

  • Enhancment Request 19985 was created for this in Elastic Support case 01503357

rodrigc avatar Oct 21 '23 22:10 rodrigc

The docs for backoff.init and backoff.max mention "network errors". I don't know if an HTTP 429 qualifies as a "network error", because really it is not a network error. For HTTP 429, the client was able to successfully make a network connection to the server, but the server decided to send back an HTTP 429 response.

If beats and elastic-agent could backoff in response to HTTP 429, that would be helpful, and I could tune the backoff parameters in elastic-agent policies.

rodrigc avatar Oct 21 '23 23:10 rodrigc

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

botelastic[bot] avatar Oct 21 '24 00:10 botelastic[bot]

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Apr 23 '25 21:04 elasticmachine

Fairly easy to reproduce this behavior

filebeat.yml

---
filebeat:
  inputs:
    - type: benchmark
      id: my-benchmark-id
      enabled: true
      count: 100
output:
  elasticsearch:
    hosts:
      - "http://localhost:9200"

mock-es

./mock-es -toomany 100 -metrics 5s

example of filebeat logs

{"log.level":"debug","@timestamp":"2025-06-25T09:44:16.681-0500","log.logger":"elasticsearch.elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails","file.name":"elasticsearch/client.go","file.line":465},"message":"Bulk item insert failed (i=1, status=429): ","service.name":"filebeat","ecs.version":"1.6.0"}

mock-es metrics

{"bulk.create.too_many":{"count":255300},"bulk.create.total":{"count":2553},"license.total":{"count":1},"root.total":{"count":2}}
{"bulk.create.too_many":{"count":635700},"bulk.create.total":{"count":6357},"license.total":{"count":1},"root.total":{"count":2}}
{"bulk.create.too_many":{"count":1019500},"bulk.create.total":{"count":10195},"license.total":{"count":1},"root.total":{"count":2}}
{"bulk.create.too_many":{"count":1402300},"bulk.create.total":{"count":14023},"license.total":{"count":1},"root.total":{"count":2}}
{"bulk.create.too_many":{"count":1777900},"bulk.create.total":{"count":17779},"license.total":{"count":1},"root.total":{"count":2}}

leehinman avatar Jun 25 '25 14:06 leehinman

Candidate fix should be up soon. New mock-es metrics:

{"bulk.create.too_many":{"count":200},"bulk.create.total":{"count":2},"license.total":{"count":2},"root.total":{"count":4}}
{"bulk.create.too_many":{"count":300},"bulk.create.total":{"count":3},"license.total":{"count":3},"root.total":{"count":6}}
{"bulk.create.too_many":{"count":400},"bulk.create.total":{"count":4},"license.total":{"count":4},"root.total":{"count":8}}
{"bulk.create.too_many":{"count":400},"bulk.create.total":{"count":4},"license.total":{"count":4},"root.total":{"count":8}}
{"bulk.create.too_many":{"count":500},"bulk.create.total":{"count":5},"license.total":{"count":5},"root.total":{"count":10}}
{"bulk.create.too_many":{"count":500},"bulk.create.total":{"count":5},"license.total":{"count":5},"root.total":{"count":10}}
{"bulk.create.too_many":{"count":500},"bulk.create.total":{"count":5},"license.total":{"count":5},"root.total":{"count":10}}
{"bulk.create.too_many":{"count":500},"bulk.create.total":{"count":5},"license.total":{"count":5},"root.total":{"count":10}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}
{"bulk.create.too_many":{"count":600},"bulk.create.total":{"count":6},"license.total":{"count":6},"root.total":{"count":12}}

faec avatar Jun 26 '25 15:06 faec