django-bulk-update
django-bulk-update copied to clipboard
Document that bulk update is available in Django 2.2
Hello @aykut! Thank you for this library and your work on it!
I just wanted to let you know that coming in Django 2.2 there will be a bulk_update()
function available on any Model
(just like bulk_create
) that uses the same method as this library (and is heavily inspired by it!) 🎉 🎉
Would documenting this be possible? We could perhaps add a fallback to the built-in bulk_create
if we are running in Django 2.2, to ease upgrading? We are also looking at adding support for specialized db-specific SQL to Django to speed this up, I'd love any input you may have!
it's so great ,but i found that the doc is in dev ,and the newest django stable ver is 2.1.2 so bulk_update is not available
@orf @Ehco1996 Django 2.2 is now released, and docs are no longer draft:
https://docs.djangoproject.com/en/2.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_update
Thanks a great. It's helpful before django 2.2.
It would also be really helpful to document the speed difference. In my experience (updating tens of thousands of lines) this package does the update waaay faster.
Can you elaborate? It uses the same method, so the speed difference should be negligible. If you find Django is way slower then please open a ticket with some details!
I'm on Django 2.2.9. Might be doing something wrong, but can't see what.
objects = list(Model.objects.all()[:10000])
Model.objects.bulk_update(
objects,
[], # about 10 fields
batch_size=1000
)
from bulk_update.helper import bulk_update
objects = list(Model.objects.all()[:10000])
bulk_update(
objects,
update_fields=[], # about 10 fields
batch_size=1000
)
Bellow is profiling done by pyinstrument. Django in-built solution takes 144 seconds on my pc. The package one takes 2.5 seconds.
@orf Hi, did you manage to to take a look at that? I could provide a more comprehensive example if necessary...
Not yet, but those samples are invaluable. The two big differences between this package and the Django inbuilt one are:
- Django uses the Expressions API. This is where the overhead is coming from - but it's pretty ridiculous one. Perhaps you're hitting an edge case here.
- Django works around SQL parameter limitations in some databases (Oracle/SQlite).
Knowing the types of the fields would be really useful, as well as the kinds of values those fields contain (lots of large strings or arrays?).
I would love it if you could perhaps post the timings here with 1 to 10 columns, and the same number of rows. It would be interesting to see how the times grow?
I created an example project. Please excuse the non-imaginative naming of my classes and columns. The column types respect the types of my model in the project where I encountered the problem. In the project I have there are a bunch of other columns as well which aren't updated, I didn't include them, but obviously that isn't what's causing the slow performance, as the results bellow show. https://github.com/mikicz/bulk-update-tests
The most important bits: Models: https://github.com/mikicz/bulk-update-tests/blob/master/apps/something/models.py Update code: https://github.com/mikicz/bulk-update-tests/blob/master/apps/something/test_bulk_update.py
Results on my new quite powerfull Dell, with local PostgreSQL 11.5
In [1]: Something.objects.count()
Out[1]: 1008895
In [2]: from apps.something.test_bulk_update import *
In [3]: %time inbuilt()
CPU times: user 2min 9s, sys: 653 ms, total: 2min 10s
Wall time: 2min 24s
In [4]: %time bulk_update_package()
CPU times: user 11.2 s, sys: 41.6 ms, total: 11.3 s
Wall time: 24.2 s
I would say that the difference is quite significant. I am planning to do some more analysis, maybe dropping some of the different column types from the update etc.
I'd like to really thank you for creating a reproduction repository @mikicz, it's increadibly helpful and I wish everyones reproductions where as detailed as this!
I've created a ticket on the Django bugtracker (https://code.djangoproject.com/ticket/31202) and assigned it to myself. I'm somewhat busy right now but I promise that I will spend some time and see if I can dig more into this. I'll update the ticket rather than this issue.
Thanks again for creating this test case.
I'm only happy to be of help, to be a tiny part of making Django such a great resource for everybody using it. Thank you for your work and actioning this!