kafka-connect-elasticsearch icon indicating copy to clipboard operation
kafka-connect-elasticsearch copied to clipboard

Add option to use auto-generated IDs on indexing

Open gjw13 opened this issue 2 years ago • 3 comments

Problem

While setting the document ID when indexing does provide exactly once delivery, it does put more load on Elasticsearch and is not necessary for all use cases.

PRs have been made for this issue before (https://github.com/confluentinc/kafka-connect-elasticsearch/pull/393) and (https://github.com/confluentinc/kafka-connect-elasticsearch/pull/510). This PR is largely an update to the most recent one, as again there were many merge conflicts that needed resolving there as it fell out of date.

Addresses https://github.com/confluentinc/kafka-connect-elasticsearch/issues/139 and https://github.com/confluentinc/kafka-connect-elasticsearch/issues/97

Solution

Add a new option to use the autogenerated document id on index requests. The new option (use.autogenerated.ids) will default to false and only be applicable when write.method is set to INSERT.

Note that the large diff in the DataCoverter class on the convertRecord method is a result of having to pull a chunk of that code out into a separate method. The checkstyle plugin was throwing errors when an extra statement was added in that the cyclomatic complexity got too high.

Does this solution apply anywhere else?
  • [ ] yes
  • [x] no
If yes, where?

Test Strategy

Testing done:
  • [x] Unit tests
  • [ ] Integration tests
  • [ ] System tests
  • [x] Manual tests

As with (https://github.com/confluentinc/kafka-connect-elasticsearch/pull/510), we are running live connectors leveraging this feature.

Release Plan

gjw13 avatar Mar 15 '23 15:03 gjw13

I attempted to sign the CLA, but the URL doesn't resolve.

gjw13 avatar Mar 15 '23 15:03 gjw13

CLA assistant check
All committers have signed the CLA.

cla-assistant[bot] avatar Sep 11 '23 09:09 cla-assistant[bot]