couchdb-python icon indicating copy to clipboard operation
couchdb-python copied to clipboard

Provide ability to do bulk dump and load

Open djc opened this issue 10 years ago • 6 comments

From [email protected] on June 14, 2013 17:08:01

Currently load.py and dump.py utilities are loading/dumping documents one by one which is tremendously slow.

Introducing bulk loading/dumping will really speed up the things here.

Maybe we can add an option like "--bulk-size" with default value set to 1 (load/dump documents one by one, just like it happens now) to allow user some additional utility tuning.

Original issue: http://code.google.com/p/couchdb-python/issues/detail?id=226

djc avatar Jul 12 '14 14:07 djc

From [email protected] on June 14, 2013 08:10:36

I'm working on initial implementation here, will provide some patches later

djc avatar Jul 12 '14 14:07 djc

From [email protected] on June 17, 2013 05:10:15

I finished bulk dumping documents. You can see it here https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=e0f1bda24cc0bc487bf782ebdabc9d817bf7d4f6&name=bulk_dumping

djc avatar Jul 12 '14 14:07 djc

From djc.ochtman on June 17, 2013 05:44:32

Good stuff! For inclusion into CouchDB-Python, I have a number of requests:

  • Please remove the change in .hgignore, as it isn't needed anymore
  • Please see if you can add a test for the new behavior
  • It would be great if you can split this into two patches: one that abstracts writing into a separate function, and another one that actually does the bulk requests/writes -- this makes it easier to review the changes now and in the future

djc avatar Jul 12 '14 14:07 djc

From [email protected] on June 17, 2013 10:23:15

I fixed your requests and added bulk load method. https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=de81adea330909f13d9bf37f98e25d4b7c657a92&name=bulk_dumping

djc avatar Jul 12 '14 14:07 djc

From djc.ochtman on June 18, 2013 01:00:06

I've pushed modified versions; for r6f91fa675423 , I:

  • Renamed function from write_dump() to dump_doc()
  • Moved dump_doc() outside dump_db(), added envelope argument
  • Rewrote commit message to clarify

In re8cafe210d91 , I:

  • Made sure lines didn't get longer than 80 chars
  • Tightened up the loop code (while True, if condition: break is a little silly)
  • Rewrote commit message to clarify

Could you redo your bulk loading along these lines? You also introduce a bug wrt error handling; db.update() doesn't throw Exceptions like db.setattr(). Also, your test case references a test data file that isn't included in the patch.

djc avatar Jul 12 '14 14:07 djc

From [email protected] on June 18, 2013 08:32:57

I hope I clearly understand your recommendations about code design. I pushed it to https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=46b5043fe465274850c4a821e468ca9ca90b70e0&name=bulk_dumping I did not understand what you mean about test data file. I don't have any test data files.

djc avatar Jul 12 '14 14:07 djc