transporter icon indicating copy to clipboard operation
transporter copied to clipboard

Toggle upsert retry behavior

Open ebaizel opened this issue 8 years ago • 18 comments

When transporting from Mongo to Mongo, it would be great if there was a way to simply no-op if the insert fails because of a duplicate document already existing. Right now transporter tries updating the document.

I couldn't find anything in the docs that seemed to control this behavior, so if I missed it, please let me know. Thanks.

ebaizel avatar Mar 28 '17 01:03 ebaizel

is updating the document an issue? and are you using the bulk option for writes?

jipperinbham avatar Mar 28 '17 01:03 jipperinbham

yes i'm using the bulk option for writes

the issue is that we have a listener on the oplog and it adds more overhead if it is an update since it needs to do another query to get the full document, whereas with an insert it has the full document in the oplog

ebaizel avatar Mar 28 '17 01:03 ebaizel

here's my pipeline.js

var source = mongodb({
  "uri": "mongodb://user:[email protected]:12345/mydb",
  "ssl": true,
  "bulk": true
})

ebaizel avatar Mar 28 '17 01:03 ebaizel

ok, so, you only want to tail the oplog for inserts?

jipperinbham avatar Mar 28 '17 01:03 jipperinbham

no, the other way around. on the destination db, we only want to have inserts, and no upserts done on it

ebaizel avatar Mar 28 '17 01:03 ebaizel

gotcha, you could put in a goja transform function that will skip any update operations, something along the lines of

function transform(doc) {
  if (doc['op'] === 'u') {
    doc['op'] = 's';
  }
  return doc
}

jipperinbham avatar Mar 28 '17 01:03 jipperinbham

that's neat, i haven't worked with that before. is there overhead loading the javascript vm to run the function? or is that already being done anyway?

ebaizel avatar Mar 28 '17 01:03 ebaizel

yes, there's definitely some overhead there which is why we've started building native, single purpose functions that don't come with the same performance penalty.

jipperinbham avatar Mar 28 '17 01:03 jipperinbham

got it. so would the doc['op'] be update and then noop, instead of the u and s you have in your example above? those are the functions i'm seeing on https://github.com/compose/transporter/wiki/Messages

so something like:

function transform(doc) {
  if (doc['op'] === 'update') {
    doc['op'] = 'noop';
  }
  return doc
}

ebaizel avatar Mar 28 '17 02:03 ebaizel

slightly off topic, but what's the expected rate gain with bulk: true? i've been seeing around 65k/minute rates without vs around 85k/minute with it. does this seem in line with what you'd expect? just want to ensure i've got it running as expected. thx

ebaizel avatar Mar 28 '17 02:03 ebaizel

the incoming op would be update and you'd need to set it to skip

jipperinbham avatar Mar 28 '17 02:03 jipperinbham

I'm actually surprised you're seeing that good of ops/sec without bulk. most of the testing I did showed bulk to have at least double the throughput

jipperinbham avatar Mar 28 '17 02:03 jipperinbham

interesting. i haven't been using a transform so maybe that's got something to do with it? will try the update/skip out right now. really appreciate all your help.

ebaizel avatar Mar 28 '17 03:03 ebaizel

ah, yea, it's very possible the transform function is slowing things down enough for bulk to not have as big of an impact. I created a new issue https://github.com/compose/transporter/issues/342 to accomplish this without the JS function so keep an eye out for it.

jipperinbham avatar Mar 28 '17 03:03 jipperinbham

hey JP! qq on the transform fn. i'm trying the suggestion you had above about skipping if the op === update but the docs are all being sent to the sink as an update.

function transform(doc) {
  if (doc['op'] === 'update') {
    doc['op'] = 'skip';
  }
  return doc
}

i also tried 'noop' instead of 'skip' as that's mentioned here, but still no luck https://github.com/compose/transporter/wiki/Messages

do you know how i can print debug statements within the fn? i've tried console.log but looks like that's not understood within goja.

ebaizel avatar May 24 '17 23:05 ebaizel

ok, this is not the answer I'd prefer to give as it's counterintuitive, the docs are off, and it really should not work this way. all that being said, here's what you'll need:

function transform(doc) {
  if (doc['op'] === 'u') {
    doc['op'] = 's';
  }
  return doc
}

jipperinbham avatar May 25 '17 14:05 jipperinbham

hm i'm still seeing the updates go through. i created a file skipUpsert.js that only contains your function, and i'm calling everything with:

t.Source("source", source, "/^balances/").Transform(goja({"filename":"skipUpsert.js"})).Save("sink", sink)

does that look right?

ebaizel avatar May 25 '17 17:05 ebaizel

yes, that looks right, add the following flag when you run it:

-log.level=debug

that may give us some insight as to what's happening

jipperinbham avatar May 25 '17 19:05 jipperinbham