django-bulk-update icon indicating copy to clipboard operation
django-bulk-update copied to clipboard

bulk_update_or_create(model_instances) or bulk_update(model_instances, upsert=True)?

Open candeira opened this issue 9 years ago • 9 comments

For my current job we need bulk upsert of records, and I'm thinking of forking your package and implementing bulk_upsert myself. If/when I do that, I'd like to do it in the manner that's most likely to be accepted into your project, so as not to maintain an independent fork.

Which syntax do you prefer?

  • bulk_update_or_create(model_instances)
  • bulk_update(model_instances, upsert=True)
  • bulk_upsert(model_instances)

For now I'd only make my changes compatible with Postgres 9.5+, because that's what we're using and because I'm relatively new at this niche.

Any other advice/comment?

candeira avatar May 01 '16 12:05 candeira

Hi,

I'm not sure it is a good idea to include bulk_create into this project. Django already has built-in bulk_create method. Why not separate the objects into create and update, then use bulk_create and bulk_update explicitly?

aykut avatar May 02 '16 05:05 aykut

@candeira I'm way into that! I could use this on my project, for sure.

@aykut bulk_update_or_create is different from bulk_create?

ckcollab avatar Jun 01 '16 23:06 ckcollab

@candeira @aykut @ckcollab this would be amazingly helpful

phlax avatar Mar 16 '17 15:03 phlax

@aykut the problem with doing bulk_create is that you need to know in advance which ones exist already - so requires an additional query i think

phlax avatar Mar 16 '17 15:03 phlax

I need this feature. Any news?

mehdipourfar avatar Aug 02 '17 08:08 mehdipourfar

I think it's possible to add this feature. I would call it bulk_update_or_create because django already has a update_or_create for single instances.

But even if we implement this function here, we will also need to know which instances already exist (performing an additional query). bulk_update_or_create will actually split the list of instances and call bulk_create and bulk_update separately. So each batch will perform 3 queries.

Seems reasonable for you? Any better approach?

arnau126 avatar Aug 02 '17 09:08 arnau126

It's seems reasonable. Although both postgres and mysql now suport bulk upsert: https://stackoverflow.com/questions/34514457/bulk-insert-update-if-on-conflict-bulk-upsert-on-postgres https://stackoverflow.com/questions/6286452/mysql-bulk-insert-or-update

mehdipourfar avatar Aug 03 '17 05:08 mehdipourfar

I do agree with @arnau126, Any update about this feature.

abdulwahid24 avatar Aug 31 '17 10:08 abdulwahid24

The 3 query approach is a race condition; unless you can be sure your program is the only one writing to that table you'll have to add retry logic around the transaction (as records can get added and removed between your read and create step).

SQL level UPSERT is the way to go for atomic single query update/create.

Bartvds avatar Aug 31 '17 11:08 Bartvds