kafka-connect-elasticsearch icon indicating copy to clipboard operation
kafka-connect-elasticsearch copied to clipboard

Check if mapping exists does not work with data-streams

Open AurelPaulovic opened this issue 3 years ago • 2 comments

Version checked: 11.1.7 ES version: 7.16 Issue location: ElasticsearchClient.maping(String index)

When using data-streams the ElasticsearchSinkTask.tryWriteRecord(SingRecord sinkRecord, OffsetState offsetState) will first check if there are any predefined mappings for the index in ES and if not, kafka-connect-elasticsearch will try to create explicit mappings based on the schema of the record. This works fine with normal indicies but fails when we use data-streams in ES. The call in ElasticsearchClient.mapping(String index) succeeds, but it returns the actual [backing indicies|https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html#backing-indices] which have different names than the index that we are checking (the name of the datastream). This in turn causes the simple HashMap search in ElasticsearchClient.mapping(String index) to return null. As a result, kafka-connect-elasticsearch never finds any preexisting mappings in ES for the index and will try to create new explicit mappings and make it impossible to redefine or manage them from ES.

As a side note - I haven't tested it properly, but it looks like the check will fail in the same way when aliases are used in ES (data-streams are a kind of an alias)

AurelPaulovic avatar Jan 24 '22 14:01 AurelPaulovic

I'm facing the same issue and I confirm that it also fails for index aliases.

Version checked: 13.1.0 and 14.0.5 ES version: 7.17.X

As explained correctly by @AurelPaulovic the mappings are defined at the backing indices of a data stream or an alias, which have different names than the name of the GetMappings request. See examples below:

For a Data stream: GET my-topic/_mapping Response: { ".ds-my-topic-2023.02.09-000001" : { "mappings" : { "_data_stream_timestamp" : { "enabled" : true }, "properties" : { "@timestamp" : { "type" : "date" }, ...

For an index alias: GET my-topic/_mapping Response: { "my-topic-000001" : { "mappings" : { "_data_stream_timestamp" : { "enabled" : true }, "properties" : { "@timestamp" : { "type" : "date" }, ...

This behavior does not allow using explicit field mappings defined in Elasticsearch. Unfortunately this is mandatory in my case, because I need to re-define date fields from long (epoch_millis) to date.

mortht avatar Feb 09 '23 07:02 mortht

I am also getting the same error. Does anyone have an alternative solution?

quanvo17 avatar Jun 13 '23 01:06 quanvo17