embulk-output-elasticsearch icon indicating copy to clipboard operation
embulk-output-elasticsearch copied to clipboard

multi-fields config

Open jmwiersma opened this issue 9 years ago • 6 comments

Hi,

I like to push string input from CSV columns both as analyzed field for full-text search, and well as a not_analyzed field to Elasticsearch. Normally that can be done with the fields mapping parameter; https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html How do I specify that using embulk ?

jmwiersma avatar Oct 16 '16 08:10 jmwiersma

Hi. Currently, embulk-output-elasticsearch doesn't support index_template.

But I think it's a good idea to support this feature like fluent-plugin-elasticsearch is doing.

sakama avatar Oct 18 '16 10:10 sakama

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-admin-indices.html#java-admin-indices-put-mapping

@jmwiersma @sakama How can we specify this mapping as plugin config options? Any thoughts?

muga avatar Oct 20 '16 12:10 muga

Sorry, I feel a technical reply is beyond my abilities on this one.

For now as a work around I have pushed a template to ES, to created non_analyzed fields for any index named embulk-*. This is a mirror of how LogStash handles this. Example:

curl -XPUT http://localhost:9200/_template/embulk-* -d '{ "template" : "embulk-*", "settings" : { "index.refresh_interval" : "5s" }, "mappings" : { "_default_" : { "_all" : {"enabled" : true, "omit_norms" : true}, "dynamic_templates" : [ { "message_field" : { "match" : "message", "match_mapping_type" : "string", "mapping" : { "type" : "string", "index" : "analyzed", "omit_norms" : true, "fielddata" : { "format" : "disabled" } } } }, { "string_fields" : { "match" : "*", "match_mapping_type" : "string", "mapping" : { "type" : "string", "index" : "analyzed", "omit_norms" : true, "fielddata" : { "format" : "disabled" }, "fields" : { "raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256} } } } } ] } } }'

jmwiersma avatar Oct 20 '16 19:10 jmwiersma

@muga @jmwiersma

How can we specify this mapping as plugin config options? Any thoughts?

I think specify as a file is better. Description method of index template is too complex to specify as column option of embulk.

For example, fluent-plugin-elasticseach has 2 options.

sakama avatar Oct 21 '16 04:10 sakama

Thank you. name and file path (or json content) are good as config options. Will take it.

muga avatar Oct 21 '16 13:10 muga

:+1:

niwatolli3 avatar Sep 25 '17 16:09 niwatolli3