kafka-connect-elasticsearch
kafka-connect-elasticsearch copied to clipboard
kafka-connect-elasticsearch Issue with indexing string fields into ElasticSearch 6.3.0
I recently upgraded my 4.1.1 version of kafka-connect-elasticsearch to take advantage of the newly added basic auth but seem to have hit a snag. One of my topics was holding a 600 field object being sent to elastic where most of the fields are strings. Looks like we recently added some code to deal with the elasticsearch text vs keyword changes.
private static void addTextMapping(ObjectNode obj) { // Add additional mapping for indexing, per https://www.elastic.co/blog/strings-are-dead-long-live-strings ObjectNode keyword = JsonNodeFactory.instance.objectNode(); keyword.set("type", JsonNodeFactory.instance.textNode(KEYWORD_TYPE)); keyword.set("ignore_above", JsonNodeFactory.instance.numberNode(256)); ObjectNode fields = JsonNodeFactory.instance.objectNode(); fields.set("keyword", keyword); obj.set("fields", fields); }
The above code basically doubled the number of fields being created in the elastic search type mapping and elastic search threw .
{"root_cause":[{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [tn_xyx_har] has been exceeded"}],"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [tn_xyz_har] has been exceeded"} at io.confluent.connect.elasticsearch.jest.JestElasticsearchClient.createMapping(JestElasticsearchClient.java:255) at io.confluent.connect.elasticsearch.Mapping.createMapping(Mapping.java:67) at io.confluent.connect.elasticsearch.ElasticsearchWriter.write(ElasticsearchWriter.java:260) at io.confluent.connect.elasticsearch.ElasticsearchSinkTask.put(ElasticsearchSinkTask.java:163) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:524)
I tried to remedy the situation by increasing number of fields in the elasticsearch index by setting "mapping.total_fields.limit": 1500. Looks like that does not have any affect. I would like to have an option where we can specify
- which fields are TEXT
- which fields are KEYWORD
- And which one s are both.
In general I would like to retain more control on how the mappings are defined and let the connector only verify that the mapping confirms to what lives in the schema registry.
I am thinking the increase in field count for strings will also cause performance issues as it is part of cluster state shipped to all elastic nodes and the elastic guys are very sensitive to any drastic changes to the mapping size.
Wanted to raise the issue and see if I am missing something or there is some work that needs to be done here.