monstache icon indicating copy to clipboard operation
monstache copied to clipboard

Processing one collection into multiple indices in ElasticDB

Open marcelhdl opened this issue 5 years ago • 7 comments

Hi there,

is it possible to send different files from one Collection into different indices in Elastic? Background is that we have one collection in the MongoDB which contains multiple documents in different languages and we want to index per language in elastic to use different analyzer and so on.

the documents in mongo have a field called 'language' which should be the trigger for each index.

Would be really nice to hear from you.

kind regards, Marcel

marcelhdl avatar Dec 20 '19 09:12 marcelhdl

Hi @marcelhdl,

The best way to do this is to implement a go plugin.

Implement a Map function and add a special case if the namespace on the input event matches your collection. In that case use the Document field on the input to determine the language. Based on the language set the index name you want on the output.

Return Passthrough = true for any namespaces that you don't want to customize.

func Map(input *monstachemap.MapperPluginInput) (output *monstachemap.MapperPluginOutput, err error)

https://godoc.org/github.com/rwynn/monstache/monstachemap

rwynn avatar Dec 20 '19 23:12 rwynn

@marcelhdl another technique that you could use (instead of a go plugin) would be to do the routing on the Elasticsearch side using a default pipeline.

This post outlines the technique pretty well.

Basically, you adjust the index template on the index that monstache sends to and include a default pipeline on it. Then in that pipeline you can inspect the language of the document and route it to a different index.

rwynn avatar Dec 21 '19 16:12 rwynn

Hi @rwynn. Slight twist on the above question: Is there a way to map the same document to different Elasticsearch indexes? (The use case is that we want two different indexes, corresponding to the same MongoDB collection, which have their data formatted in different ways.)

tf3 avatar Aug 20 '21 14:08 tf3

Hi @rwynn. Slight twist on the above question: Is there a way to map the same document to different Elasticsearch indexes? (The use case is that we want two different indexes, corresponding to the same MongoDB collection, which have their data formatted in different ways.)

I really need this feature...

aeharvlee avatar Oct 21 '22 05:10 aeharvlee

If you use a go plugin as described here to leverage the Process function you can add as many requests to the bulk indexer as you would like.

rwynn avatar Oct 21 '22 13:10 rwynn

Hi @rwynn, @tf3 I too am trying to create multiple indices from single MongoDB collection Here custom_index_1, custom_index_2 are getting created but documents are not getting indexed. The indices are empty. Can someone please check if I am missing something.

package main

import (
    "github.com/rwynn/monstache/v6/monstachemap"
    "github.com/olivere/elastic/v7"
    "fmt"
)
func Process(input *monstachemap.ProcessPluginInput) (err error) {
    doc := input.Document
    bulk := input.ElasticBulkProcessor
    req1 := elastic.NewBulkIndexRequest().Index("custom_index_1").Id(fmt.Sprintf("%v", doc["_id"])).Doc(doc)
    bulk.Add(req1)
    req2 := elastic.NewBulkIndexRequest().Index("custom_index_2").Id(fmt.Sprintf("%v", doc["_id"])).Doc(doc)
    bulk.Add(req2)
    return nil
}

DeveshBilapate avatar Nov 21 '23 17:11 DeveshBilapate

Hi Devesh,

At a quick glance the problem may be that you need to delete the key _id from the map you pass to .Doc.

Can you try something like this?

func Process(input *monstachemap.ProcessPluginInput) (err error) {
    doc := input.Document
    id := fmt.Sprintf("%v", doc["_id"])
    delete(doc, "_id")
    bulk := input.ElasticBulkProcessor
    req1 := elastic.NewBulkIndexRequest().Index("custom_index_1").Id(id).Doc(doc)
    bulk.Add(req1)
    req2 := elastic.NewBulkIndexRequest().Index("custom_index_2").Id(id).Doc(doc)
    bulk.Add(req2)
    return nil
}

rwynn avatar Dec 03 '23 15:12 rwynn