django-eventstream
django-eventstream copied to clipboard
Faster event log trimming
Fixes #93
Thanks! I think we still want the loop, since the query is limited to the batch size. And probably the .delete()
goes after the slice?
Good catch on the loop, I can add that again. But do we even want to limit it? Deletion really isn't that expensive of an operation, why do we need to restrict the batch size? Because querysets are lazy, it is evaluated only upon slicing so that's how we limit the clause if we do want to.
The batch limit is there so that all the results don't get buffered in memory. I seem to recall that even though querysets are evaluated lazily, iterating a queryset may still cause all the items in the entire set to end up in memory.
If calling delete()
within a queryset doesn't cause all the items in the set to first be read into memory, and also if the deleting isn't atomic (i.e. doesn't block access to the table in between deletions of each item in the queryset), then the batching may be unnecessary.
Ok then I'll just remove it all together.
.delete() won't load any of the deleted object into memory. Also all items are deleted at once (it's one sql DELETE ... WHERE ...
statement after all), so the database is only blocked during that one query.
We don't want to block the database/table though. Some searching around suggests that bulk deletes may perform a table lock, for example: https://stackoverflow.com/questions/11230225/how-to-efficiently-delete-rows-while-not-using-truncate-table-in-a-500-000-rows
Sorry to go back and forth on this but probably this should stay batched. That way a large trim only blocks the thread doing the trimming, while the table remains otherwise accessible and events can be added.