couchdb Optimize _bulk

Optimize _bulk_get

Open nickva opened this issue 2 years ago • 1 comments

As part of optimizing the replicator and implementing _bulk_get support for it, noticed that _bulk_get is implement in the most inefficient way possible: it makes sequential fabric:open_revs/4 calls for each {doc_id, rev}, when instead it could be grouping documents by shard range an issuing a bulk open_revs request per shard range, not unlike what _bulk_docs does.

Sep 21 '22 19:09 nickva

For instance, here is a snippet of logs from running a basic benchmark with couchdyno:

POST /cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 420
POST /cdyno-0000002/_bulk_docs 201 ok 122
POST /cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 429
POST /cdyno-0000002/_bulk_docs 201 ok 126
POST /cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 419
POST /cdyno-0000002/_bulk_docs 201 ok 124
POST /cdyno-0000002/_bulk_docs 201 ok 130
POST /cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 387
POST /cdyno-0000001/_bulk_get?latest=true&revs=true&attachments=false 200 ok 375
POST /cdyno-0000002/_bulk_docs 201 ok 137

The numbers at the end are the milliseconds it took to process the request. It looks like the _bulk_get (read) requests are almost 2x-3x as expensive as a _bulk_docs (write) requests. So it seems there is room for some optimization there.

Sep 21 '22 19:09 nickva

couchdb couchdb copied to clipboard

Optimize _bulk_get

couchdb
couchdb copied to clipboard