kafka-connect-elasticsearch icon indicating copy to clipboard operation
kafka-connect-elasticsearch copied to clipboard

Ability to manage versioning and generation of document IDs in the index separately

Open ermakvlas opened this issue 2 years ago • 0 comments

I want to use the key of the message in the topic as the id, but at the same time use the internal versioning of the elasticsearch. Now this is impossible.

Currently property key.ignore determines the mapping of the document id in index https://github.com/confluentinc/kafka-connect-elasticsearch/blob/f2fdc49d4e3c68cd4bb3a9ea8886387e09a5ae3c/src/main/java/io/confluent/connect/elasticsearch/DataConverter.java#L159-L162 and versioning type at the same time https://github.com/confluentinc/kafka-connect-elasticsearch/blob/f2fdc49d4e3c68cd4bb3a9ea8886387e09a5ae3c/src/main/java/io/confluent/connect/elasticsearch/DataConverter.java#L251-L254

By setting the value of property key.ignore=false I refuse to use offset when generating the id. However, at the same time, it is forced to use external versioning, which without options uses offset as the version number. It looks illogical, maybe there are some reasons for this behavior? Forcing the use of offset for versioning can be undesirable in cases where there is a possibility of resetting the offset on the topic that the connector reads. In such a situation, new versions of existing documents will have a smaller version (offset will reset to 0) and the update will fail. Further correct updating of documents in this case will require a complete re-indexing of the data.

ermakvlas avatar Nov 25 '22 10:11 ermakvlas