cozy-data-system icon indicating copy to clipboard operation
cozy-data-system copied to clipboard

Bulk update

Open bnjbvr opened this issue 9 years ago • 6 comments

I see there is an end-point for bulk delete, but not for bulk update. However, the db_remove_helper suggests that it shouldn't be too different to do (as a deletion seems to be actually an update, iiuc). Is there any way we could have a bulk update end-point, please?

Here's an API i imagine:

PUT /request/:type/:req_name/update
Param:
  type: the doctype name
  req_name: the name of the request
  Body {
    key: only returns document for this key
    keys: [only returns document for this array of keys]
    limit: number of documents to return
    skip: number of documents to skip
    startKey: only returns document after this key
    endKey: only returns document before this key
    update: [[key, value], [key, value]] or [key, value]
  }
  The body is fully optional.

For instance, with

body = {
  update: ['date', '1234']
}

Then the 'date' field of all values should be updated to '1234'.

body = {
  update: [['date', '1234'], ['bankAccount', '4321']]
}

Then the 'date' field of all values is updated to 1234, and the 'bankAccount' field of all values is updated to 4321.

Not sure about the naming of update, maybe values or replaceBy would be more adequate.

bnjbvr avatar Mar 03 '15 20:03 bnjbvr

Having the update as a map would be easier I think :

PUT /request/file/all/byPath/update/
{
  startkey: "/some/path/"
  endkey: "/some/path/ZZZZ"
  update: {
    lastModification: 3554232,
    author: "John"
  }
}

Or, more powerful (and similar performance: 1 request app->DS, 2X request DS->Couch) a function

PUT /request/file/all/byPath/update/
{
  startkey: "/some/path/"
  endkey: "/some/path/ZZZZ"
  update: "function(doc){
    doc.tags = doc.tags.map(function(tag){
        if(tag === 'old-name') return 'new-name';
        else return tag;
    }
    return doc
 }"
}

I agree this would be convenient and might improve performance of some cozy apps. Not sure when we will have the time to do it

aenario avatar Mar 04 '15 08:03 aenario

Yes, a map would indeed be better (I wonder how I could have missed this). I'll try to add this feature if I get some time soon, this could show useful for Kresus.

bnjbvr avatar Mar 04 '15 09:03 bnjbvr

:+1: This would be useful to import larger time series (for example, 15 years of currency data for ~10 different currencies is north of 25000 values).

EDIT: Actually, bulk insert would be useful for that, which also shouldn't be very different from bulk delete / update.

jankeromnes avatar Mar 23 '15 23:03 jankeromnes

FYI, according to this post about couchdb performance:

  • Inserting docs one-by-one happens at ~260 docs/sec,
  • Inserting docs in bulk (groups of 10000 docs) happens at ~2700 docs/sec,
  • With even more optimizations (e.g. bypass HTTP and JSON conversions) it can go higher than 10500 docs/sec.

jankeromnes avatar Mar 23 '15 23:03 jankeromnes

@jankeromnes , One thing to keep in mind is that cozy has a lot of views in couchdb. So adding many documents slow down the whole cozy. We already had some troubles with emails when importing large accounts.

So for your use case of currency, (ie. never changing values). The performance-optimal solution would be to have huge documents with, say, all data for a month and then using views to produce the index you need

function(doc){
    if (doc.docType == 'MonthOfCurrencyExchange')
          for (date in doc.dates)
                for (currency in date.currencies)
                       emit(date, currency, value);

aenario avatar Mar 24 '15 08:03 aenario

Interesting idea, thanks @aenario! Actually it might even be faster to store all my values in one single document, since the number of documents seems to be the bottleneck. I'll try different options.

jankeromnes avatar Mar 24 '15 10:03 jankeromnes