beats
beats copied to clipboard
Upgrade azure-eventhub to the new Event Hub SDK
Proposed commit message
Restructure the azure-eventhub input, rebranding the current version as processor v1. Add a brand new processor v2, allowing users to select which version to use in the config:
- processor v1: uses the legacy Event Hub SDK (default processor, at least for 8.15)
- processor v2: uses the modern Event Hub SDK
Why are we introducing a processor v2?
- processor v1 uses deprecated libraries
- github.com/Azure/azure-event-hubs-go (legacy)
- github.com/Azure/azure-storage-blob-go (legacy, retiring on Sep 2024)
- processor v1 does not support publishing acks (mostly due to lack of hooks; the legacy SDK is a black box)
Notes for reviewers
Overview
To help with the review, here is an overview of the main flow of the processor v2-based input.
- The processor v2 starts a new consumer for each event hub partition.
- Each consumer creates a pipeline client.
- When a consumer receives an event, it decodes it and sends it to the pipeline client.
- When the pipeline successfully processes the event, it acknowledges with the consumer.
- The consumer stores the sequence number of the last successful event in the partition blob in the storage account container.
New features
- Replace the legacy SDK with the new modern and supported SDK
- Add support for publishing ACKs
- Add a migration assistant to migrate checkpoint v1 information to the v2 format
Replace the legacy SDK with the new modern and supported SDK
The new SDK is more flexible and allows us to implement new features and configuration options.
Add support for publishing ACKs
Now, the processor v2 updates the sequence number only when the events have been successfully delivered to Elasticsearch.
Add a migration assistant to migrate checkpoint v1 information to the v2 format
On the first start of the processor v2, the migration assistant (enabled by default) checks if checkpoint v1 information exists from processor v1 and migrates them to the v2 format.
See "Scenario 001: Migration" at x-pack/filebeat/input/azureeventhub/README.md for more details.
New configuration options
There are new configuration options for v2:
storage_account_connection_string(required) to authenticate with the storage account container.migrate_checkpoint(optional, default:yes) controls if the processor v2 should check and migrate checkpoint v1 information on start.processor_version(optional, default:v1) which processor version to use.processor_update_interval(optional, default:10s) time interval between checking if new partitions are available.processor_start_position(optional, default:earliest) controls if the processor should start from the beginning earliest or the latest event in the event hub retention period.partition_receive_timeout(optional, default:5s)partition_receive_count(optional, default:100)
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] I have made corresponding change to the default configuration files
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I have added an entry in
CHANGELOG.next.asciidocorCHANGELOG-developer.next.asciidoc.
Disruptive User Impact
Author's Checklist
- [ ]
How to test this PR locally
See "Test Scenarios" section in the x-pack/filebeat/input/azureeventhub/README.md file.
Related issues
- Closes https://github.com/elastic/beats/issues/33815
Use cases
Screenshots
Logs
This pull request does not have a backport label. If this is a bug or security fix, could you label this PR @zmoog? 🙏. For such, you'll need to label your PR with:
- The upcoming major version of the Elastic Stack
- The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)
To fixup this pull request, you need to add the backport labels for the needed branches, such as:
backport-v8./d.0is the label to automatically backport to the8./dbranch./dis the digit
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)
This pull request doesn't have a Team:<team> label.
This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/
git fetch upstream
git checkout -b zmoog/azure-eventhub-sdk-upgrade upstream/zmoog/azure-eventhub-sdk-upgrade
git merge upstream/main
git push upstream zmoog/azure-eventhub-sdk-upgrade
This pull request is now in conflicts. Could you fix it? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/
git fetch upstream
git checkout -b zmoog/azure-eventhub-sdk-upgrade upstream/zmoog/azure-eventhub-sdk-upgrade
git merge upstream/main
git push upstream zmoog/azure-eventhub-sdk-upgrade