beats icon indicating copy to clipboard operation
beats copied to clipboard

Add test for elasticsearch re-connection after network error & allow graceful shutdown

Open belimawr opened this issue 5 months ago • 16 comments

Proposed commit message

This commit reworks the eslegclient.Connection to accept a context in its Connect method, this allows the caller to cancel any in flight requests made by the connection by cancelling the context.

The libbeat outputs.Connectable interface (used by outputs.NetworkClient) had to be updated to accept the context, which required refactoring in most of the outputs to also accept a context on connect.

The worker from libbeat/publisher/pipeline/client_worker.go now uses a context for it's cancellation instead of a channel, this context is also used when creating a connection to Elasticsearch.

An integration test is added to ensure the ES output can always recover from network errors.

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • ~~[ ] I have made corresponding changes to the documentation~~
  • ~~[ ] I have made corresponding change to the default configuration files~~
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

It's a bug fix, there is no disruptive user impact

~~## Author's Checklist~~

How to test this PR locally

  1. Build Filebeat
  2. Get it sending data to ES
  3. Disconnect from the network, stop ES, do anything that will prevent Filebeat from reaching ES
  4. Wait for network error logs
  5. Re-start ES/reconnect to the network
  6. Filebeat should recover and start sending data again.

Related issues

  • https://github.com/elastic/beats/issues/40705

~~## Use cases~~ ~~## Screenshots~~ ~~## Logs~~

belimawr avatar Sep 12 '24 17:09 belimawr