transporter icon indicating copy to clipboard operation
transporter copied to clipboard

In mongo adapter, a duplicate key error stops pipeline.

Open rossjones opened this issue 7 years ago • 2 comments

Bug report

When copying from one mongodb to another the pipeline fails if the sink database already contains an index that is found in the source database. There doesn't seem to be any way to force it to continue.

Relevant pipeline.js:

NB: Local ports are SSH tunnels

var source = mongodb({
  "uri": "mongodb://localhost:12345/source_database"
})

var sink = mongodb({
  "uri": "mongodb://localhost:6666/sink_database"
  "ssl": true,
  "bulk": true,
})

t.Source(source).Save(sink)

System info:

  • Transporter version: 0.3.0
  • OS: Darwin
  • DB version(s): source -> 2.4.9 and sink -> 3.2.11 (compose).
  • Source DB size: ~35Gb/1209 collections.

Reproducible Steps:

  1. Set up 2 mongo databases, where the sink has some of the data found in source
  2. Set up a basic pipeline mongo -> mongo
  3. transporter run
  4. Sadness.

What did you expect to happened?

Expected data to be copied from source to sink, overwriting what is currently in sink. When it didn't happen, went looking for a force option in mongo adapter documentation. Could not find what I was looking for.

What actually happened?

It complained and then stopped and I got sad.

INFO[0008] flushing bulk messages                        bsonOpSize=267000 collection="2_test_realtime" opCounter=1000
ERRO[0009] flush error, E11000 duplicate key error index: sink_database.2_test_realtime.$_id_ dup key: { : "2017-06-08T14:55:05+00:00" }
  collection="2_test_realtime"
INFO[0009] flushing bulk messages                        bsonOpSize=107413 collection="2_test_page_statistics" opCounter=233
INFO[0009] error record: 0x130eb20, message: ERROR: write message error (E11000 duplicate key error index: sink_database.2_test_realtime.$_id_ dup key: { : "2017-06-08T14:55:05+00:00" })  path="e4a72219-4166-4137-7287-d568b64c3760/f453f856-c94e-4a75-4561-9773377daf2d" ts=1521801552889828096
INFO[0009] Establishing new connection to localhost:6666 (timeout=1h0m0s)...
ERRO[0009] ERROR: write message error (E11000 duplicate key error index: sink_database.2_test_realtime.$_id_ dup key: { : "2017-06-08T14:55:05+00:00" })  path="e4a72219-4166-4137-7287-d568b64c3760/f453f856-c94e-4a75-4561-9773377daf2d"
INFO[0009] adaptor Stopping...                           name=e4a72219-4166-4137-7287-d568b64c3760 path=e4a72219-4166-4137-7287-d568b64c3760 type=mongodb

rossjones avatar Mar 23 '18 11:03 rossjones

@rossjones do you have unique constraints? are your mongodb indexes the same?

johnjjung avatar Apr 06 '18 17:04 johnjjung

How to make sure that pipeline doesn't stop at first error.

abhishekvaid avatar Dec 07 '18 11:12 abhishekvaid