transporter icon indicating copy to clipboard operation
transporter copied to clipboard

Changing Namespaces doesn't work

Open diegonc opened this issue 7 years ago • 7 comments

I'm following and adjusting the instructions from the namespaces post (here), specially the part about changing the namespace of the transformed messages.

My pipeline.js file looks like the snippet below:

var source = mongodb({
  "uri": "mongodb://localhost:3001/meteor"
});

var sink = elasticsearch({
  "uri": "http://user:pass@host:port"
});

t.Source("source", source, "/^(files)$/")
 .Transform("sort", js({filename: "sort-files.js"}), "/.*/")
 .Save("sink", sink, "/.*/") ;

And the sort-files.js transformer sends each document to a different namespace according to some field or discards it if it is not interesting.

function transform(msg) {
  if (msg.data.kind === "COVER") {
    msg.ns = "forms-covers.data";
  } else {
    msg.op = "skip";
  }

  return msg;
}

System info:

  • Transporter version: 0.4.0-rc.1-linux-amd64
  • OS: Debian 9
  • DB version(s)
    • mongodb 3.2.6
    • elasticsearch 5.2.2

Reproducible Steps:

  1. transporter run

What did you expect to happen?

According to the blog post, I expected to be able to rename each document namespace to one that follows the pattern "." and have the elasticsearch sink put the documents in the right index.

What actually happened?

trasnporter created an index named test and used the namespace of the documents as the type inside that element.

curl  http://user:pass@host:port/test
{"test":{"aliases":{},"mappings":{"forms-covers.data":{...

How can I split a collection from the MongoDB source into different indices of the ES sink?

diegonc avatar Jul 12 '17 23:07 diegonc

That article refers to a previous version of Transporter - specifically 0.1.0. Namespace handling has changed since then - In version 0.3.0 specifically.

Also ES handling of namespaces incoming and index setting has changed. To set the index, include it in the URI of the ES sink. The "test" index is created if no index is specified in the URI.

The namespace will be used as the type.

To have two indices, create two sinks, place both in the pipeline and then set the namespace so it matches only one of them.

codepope avatar Jul 13 '17 08:07 codepope

That's a pity, it seems I cannot implement my use case easily then.

The problem is that the type name is the same for all indices. Thus, assuming it's possible to have predefined sinks, there's no way to filter the documents on save because all of them have the same namespace.

Here's a diagram of what I'm trying to do transporter-uc

diegonc avatar Jul 13 '17 13:07 diegonc

I have the similar scene, just like : mongo(source) collections -> elasticsearch(sink) indicies

muziqiushan avatar Jul 31 '17 07:07 muziqiushan

just like mongo-connecotr's support "namespace_mapping" : index_name.* => *.type_name

muziqiushan avatar Jul 31 '17 07:07 muziqiushan

I think we might support this with some chained transform functions. I'll see what I can come up with and get back to you soon.

jipperinbham avatar Aug 10 '17 13:08 jipperinbham

Ok, thanks! Sounds great. I'm currently running multiple instances of transporter to implement the use case. So it's not a big issue; at leas it has been working quite good for a few days.

diegonc avatar Aug 10 '17 21:08 diegonc

@diegonc after a fair amount of testing, this is not currently possible but it should be. I've labeled it as a bug and will hopefully be able to fit it in to the 0.5.0 release.

jipperinbham avatar Aug 17 '17 14:08 jipperinbham