kafka-connect-elasticsearch
kafka-connect-elasticsearch copied to clipboard
Add option to use auto-generated IDs on indexing
Problem
While setting the document ID when indexing does provide exactly once delivery, it does put more load on Elasticsearch and is not necessary for all use cases.
PRs have been made for this issue before (https://github.com/confluentinc/kafka-connect-elasticsearch/pull/393) and (https://github.com/confluentinc/kafka-connect-elasticsearch/pull/510). This PR is largely an update to the most recent one, as again there were many merge conflicts that needed resolving there as it fell out of date.
Addresses https://github.com/confluentinc/kafka-connect-elasticsearch/issues/139 and https://github.com/confluentinc/kafka-connect-elasticsearch/issues/97
Solution
Add a new option to use the autogenerated document id on index requests. The new option (use.autogenerated.ids) will default to false and only be applicable when write.method is set to INSERT.
Note that the large diff in the DataCoverter class on the convertRecord method is a result of having to pull a chunk of that code out into a separate method. The checkstyle plugin was throwing errors when an extra statement was added in that the cyclomatic complexity got too high.
Does this solution apply anywhere else?
- [ ] yes
- [x] no
If yes, where?
Test Strategy
Testing done:
- [x] Unit tests
- [ ] Integration tests
- [ ] System tests
- [x] Manual tests
As with (https://github.com/confluentinc/kafka-connect-elasticsearch/pull/510), we are running live connectors leveraging this feature.
Release Plan
I attempted to sign the CLA, but the URL doesn't resolve.