beats icon indicating copy to clipboard operation
beats copied to clipboard

Upgrade azure-eventhub to the new Event Hub SDK

Open zmoog opened this issue 1 year ago • 4 comments

Proposed commit message

Restructure the azure-eventhub input, rebranding the current version as processor v1. Add a brand new processor v2, allowing users to select which version to use in the config:

  • processor v1: uses the legacy Event Hub SDK (default processor, at least for 8.15)
  • processor v2: uses the modern Event Hub SDK

Why are we introducing a processor v2?

Notes for reviewers

Overview

To help with the review, here is an overview of the main flow of the processor v2-based input.

  • The processor v2 starts a new consumer for each event hub partition.
  • Each consumer creates a pipeline client.
  • When a consumer receives an event, it decodes it and sends it to the pipeline client.
  • When the pipeline successfully processes the event, it acknowledges with the consumer.
  • The consumer stores the sequence number of the last successful event in the partition blob in the storage account container.

image

New features

  • Replace the legacy SDK with the new modern and supported SDK
  • Add support for publishing ACKs
  • Add a migration assistant to migrate checkpoint v1 information to the v2 format

Replace the legacy SDK with the new modern and supported SDK

The new SDK is more flexible and allows us to implement new features and configuration options.

Add support for publishing ACKs

Now, the processor v2 updates the sequence number only when the events have been successfully delivered to Elasticsearch.

Add a migration assistant to migrate checkpoint v1 information to the v2 format

On the first start of the processor v2, the migration assistant (enabled by default) checks if checkpoint v1 information exists from processor v1 and migrates them to the v2 format.

See "Scenario 001: Migration" at x-pack/filebeat/input/azureeventhub/README.md for more details.

New configuration options

There are new configuration options for v2:

  • storage_account_connection_string (required) to authenticate with the storage account container.
  • migrate_checkpoint (optional, default: yes) controls if the processor v2 should check and migrate checkpoint v1 information on start.
  • processor_version (optional, default: v1) which processor version to use.
  • processor_update_interval (optional, default: 10s) time interval between checking if new partitions are available.
  • processor_start_position (optional, default: earliest) controls if the processor should start from the beginning earliest or the latest event in the event hub retention period.
  • partition_receive_timeout (optional, default: 5s)
  • partition_receive_count (optional, default: 100)

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [x] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

See "Test Scenarios" section in the x-pack/filebeat/input/azureeventhub/README.md file.

Related issues

  • Closes https://github.com/elastic/beats/issues/33815

Use cases

Screenshots

Logs

zmoog avatar Jun 04 '24 16:06 zmoog

This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @zmoog? 🙏. For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

mergify[bot] avatar Jun 04 '24 16:06 mergify[bot]

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Jun 05 '24 07:06 elasticmachine

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

elasticmachine avatar Jun 05 '24 07:06 elasticmachine

This pull request doesn't have a Team:<team> label.

botelastic[bot] avatar Jun 05 '24 07:06 botelastic[bot]

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b zmoog/azure-eventhub-sdk-upgrade upstream/zmoog/azure-eventhub-sdk-upgrade
git merge upstream/main
git push upstream zmoog/azure-eventhub-sdk-upgrade

mergify[bot] avatar Jul 05 '24 12:07 mergify[bot]

This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b zmoog/azure-eventhub-sdk-upgrade upstream/zmoog/azure-eventhub-sdk-upgrade
git merge upstream/main
git push upstream zmoog/azure-eventhub-sdk-upgrade

mergify[bot] avatar Aug 01 '24 21:08 mergify[bot]