Toggle upsert retry behavior
When transporting from Mongo to Mongo, it would be great if there was a way to simply no-op if the insert fails because of a duplicate document already existing. Right now transporter tries updating the document.
I couldn't find anything in the docs that seemed to control this behavior, so if I missed it, please let me know. Thanks.
is updating the document an issue? and are you using the bulk option for writes?
yes i'm using the bulk option for writes
the issue is that we have a listener on the oplog and it adds more overhead if it is an update since it needs to do another query to get the full document, whereas with an insert it has the full document in the oplog
here's my pipeline.js
var source = mongodb({
"uri": "mongodb://user:[email protected]:12345/mydb",
"ssl": true,
"bulk": true
})
ok, so, you only want to tail the oplog for inserts?
no, the other way around. on the destination db, we only want to have inserts, and no upserts done on it
gotcha, you could put in a goja transform function that will skip any update operations, something along the lines of
function transform(doc) {
if (doc['op'] === 'u') {
doc['op'] = 's';
}
return doc
}
that's neat, i haven't worked with that before. is there overhead loading the javascript vm to run the function? or is that already being done anyway?
yes, there's definitely some overhead there which is why we've started building native, single purpose functions that don't come with the same performance penalty.
got it. so would the doc['op'] be update and then noop, instead of the u and s you have in your example above? those are the functions i'm seeing on https://github.com/compose/transporter/wiki/Messages
so something like:
function transform(doc) {
if (doc['op'] === 'update') {
doc['op'] = 'noop';
}
return doc
}
slightly off topic, but what's the expected rate gain with bulk: true? i've been seeing around 65k/minute rates without vs around 85k/minute with it. does this seem in line with what you'd expect? just want to ensure i've got it running as expected. thx
the incoming op would be update and you'd need to set it to skip
I'm actually surprised you're seeing that good of ops/sec without bulk. most of the testing I did showed bulk to have at least double the throughput
interesting. i haven't been using a transform so maybe that's got something to do with it? will try the update/skip out right now. really appreciate all your help.
ah, yea, it's very possible the transform function is slowing things down enough for bulk to not have as big of an impact. I created a new issue https://github.com/compose/transporter/issues/342 to accomplish this without the JS function so keep an eye out for it.
hey JP! qq on the transform fn. i'm trying the suggestion you had above about skipping if the op === update but the docs are all being sent to the sink as an update.
function transform(doc) {
if (doc['op'] === 'update') {
doc['op'] = 'skip';
}
return doc
}
i also tried 'noop' instead of 'skip' as that's mentioned here, but still no luck https://github.com/compose/transporter/wiki/Messages
do you know how i can print debug statements within the fn? i've tried console.log but looks like that's not understood within goja.
ok, this is not the answer I'd prefer to give as it's counterintuitive, the docs are off, and it really should not work this way. all that being said, here's what you'll need:
function transform(doc) {
if (doc['op'] === 'u') {
doc['op'] = 's';
}
return doc
}
hm i'm still seeing the updates go through. i created a file skipUpsert.js that only contains your function, and i'm calling everything with:
t.Source("source", source, "/^balances/").Transform(goja({"filename":"skipUpsert.js"})).Save("sink", sink)
does that look right?
yes, that looks right, add the following flag when you run it:
-log.level=debug
that may give us some insight as to what's happening