django-model-utils
django-model-utils copied to clipboard
SoftDelete is an anti-pattern
My apologies if this comes across as grumpy. I don't mean to be disrespectful. I know what it's like to maintain a popular project and no one likes it when someone pipes up about how they don't like something.
With that said though, there's a very real problem here with the SoftDeletableModel.
Problem
The SoftDeletableModel (and corresponding manager) has been the bane of my life for years. I started one job and spent a year removing it from the codebase, only to start a new job and be faced with it again. The defaults it implements (overriding .objects) break internal operations with Django (specifically the admin) and leads to hundreds of lost hours trying to debug code that's silently hiding your own data from you.
Example
Consider this simple set of models:
class User(SoftDeletableModel):
name = models.CharField(max_length=128)
class Purchase(models.Model):
user = models.ForeignKey(User)
note = models.TextField()
Now:
- Create a user and a purchase with that user.
- Soft delete that user
- Find the purchase in the admin.
- Try to save a change to the
notefield.
→ Django explodes because the attached user doesn't exist... except that it does exist but we've broken the defaults of the underlying framework by overriding .objects.all() to mean not actually all.
This problem reasserts itself when debugging: You're a new developer tasked with generating statistics based on the number of users, and using User.objects.all().count() to do it. For some reason, your math doesn't add up and you watch a few hours get sucked away by this anti-pattern. I have no idea what would happen in cases where you might do something like Purchase.objects.filter(user__name__startswith="..."), but I'm willing to bet it'll cause problems when comparing to User.objects.filter(name__startswith="...").
Please don't muck with the defaults.
Proposed Alternative
Rather than replacing .objects.all(), rework it to leave .all() alone and instead have .available(). Alternatively, leave objects alone altogether and just have an .available_objects manager on the class. In either case, explicit is better than implicit if for no other reason than there's no surprises.
Understandably, this would break current implementations using the SoftDeletableModel, but frankly not doing this means that more and more of this sort of thing will creep into code everywhere as this is a very popular library and this is being described as a good solution to a common problem. In its current implementation however it creates more problems than it solves.
At the very least, please add a warning to the documentation that using this model as your parent can create long-lasting and hard-to-remove problems with data integrity in your application and disruptions with core functions like the Django admin. I think that if the developers that came before me had seen such a message, they would have thought twice before using it as the default base for all models in the system.
That's a really good point: in fact I think the django docs explicitly call out overriding the default manager as potentially dangerous behaviour.
I think if it was to be changed, then it would need a deprecation path, but perhaps this is something that could be done in stages. (Is there a deprecation schedule for this project?)
The easiest path that I could see for this would be to introduce .available_objects(), then add a deprecation warning to .objects.all(). Users could silence the messages by switching to .available_objects() explicitly. In these terms the stages would be:
- Introduce
.available_objects() .all()includes a deprecation warning, recommending .available_objects() for the going-away behaviour..all()is removed altogether allowing for the default.
I work with @danielquinn. FWIW I wrote a blog post about the kinds of problems that soft deletion can cause. I hope other developers read it and think about the implications of adopting this implementation of soft delete, ideally before they go ahead and adopt it without realising.
https://www.rmorrison.net/mnemozzyne/2019/04/10/think-twice-before-you-soft-delete/
(I don't think soft delete per se is an antipattern, just that the over-riding of .objects is problematic)
Not sure if it good to use depreciation and .available_objects(). Personally, I like how this implemented in Laravel:
https://laravel.com/docs/5.8/eloquent#querying-soft-deleted-models
All we need is with_trashed(), only_trashed(), restore(), .forceDelete(). So if you want to get all objects including deleted, then you should just do .objects.with_trashed().all(). Although it may be a more complicated task to make soft deleted models play nice with relations.
Think it should be part of Django core actually.
The problem with something like .with_deleted() is that it’s not very easy within the scope of a queryset to “undo” a previously applied filter. And since queryset methods chain, you’d really need to be able to do that.
More than that, opting for .objects.with_trashed() breaks the default behaviour which necessarily leads to instability in working with other components that expect those defaults. Even if it seems more intuitive for some, this reason alone should be enough to dissuade you from this pattern.
Newer version of Django do have support for overriding the default manager without overriding the default related manager (if I'm not mistaken).
Might be useful and save some of this deprecation path work.
The problem with something like .with_deleted() is that it’s not very easy within the scope of a queryset to “undo” a previously applied filter. And since queryset methods chain, you’d really need to be able to do that.
It would be possible to change the query midway in a way similar to this. https://github.com/divio/djangocms-versioning/blob/3f8f16f572755e99638e87766e0d44bdbda17904/djangocms_versioning/managers.py#L14-L25
The code above effectively undoes a certain filtering criteria.
Though I tend to agree with danielquinn that the main manager should not hide data. It's confusing, and I was in quite some pain trying to debug similar issues.
@jonathan-s The djangocms method only works with get(). It doesn't work with filter().
@farooqaaa The principle of removing something from the query the way it’s done there is the same
Another solution is to use Meta.default_manager_name = 'all_objects'. That would result in the same behavior as using the .available_objects approach, except that using .objects would filter out deleted items, and it should be backwards compatible.
I've had a go at implementing the changes suggested here in #438
I'm a bit worried that emitting a warning every time soft_deletable_instance.objects is used is overkill. I'm very open to suggestions on how to improve this approach!
I think #438 is ready to go. I eagerly await your reviews.
This is something I need for a client, I understand that to blindly use soft-delete is arguably an anti-pattern as @mozz100 wrote. In our case the client wants the ability to delete incident reports and to me this is far more problematic.
As I was working on removing the model manager, I realised the current deprecation message isn't enough.
class Manufacturer(models.Model):
name = models.TextField()
class Car(SoftDeletableModel):
name = models.TextField()
manufacturer = models.ForeignKey(Manufacturer, related_name="cars", on_delete=models.CASCADE)
ford = Manufacturer.objects.create(name="Ford")
escort = Car.objects.create(name="Escort", manufacturer=ford)
focus = Car.objects.create(name="Focus", manufacturer=ford)
escort.delete() # soft deleted
ford.cars.all() # No deprecation message. Currently only includes the Focus; should include the soft-deleted Escort.
I propose an additional warning along the lines of:
This queryset will include soft-deleted records in an upcoming release; please add .available() to your queryset to continue excluding soft-deleted records.
.available() would be equivalent to .exclude(is_removed=True).
This should only appear if the user hasn't filtered by is_removed (assuming this is possible).
This makes sense @craiga
I just went through the deprecation warning and am ending up here. I just don't get the whole point around this (but it's obviously too late to do something about it... Still, I think it's a good thing to question what lead to this in the first place).
The whole point of a soft delete is to mimic a delete, without actually losing the data, in case of. It means you don't want to see the data within your apps. At all. Unless you implement something specific to do so, or you dig into your db. So yeah, it totally makes sense the data that is deleted is not accessible within the django admin.
Regarding relationships between data, you obviously have to manage the consequences of deleting data... just like we do with DB: cascade, prevent, set null. If data is soft deleted without taking care of that, it's not the django admin nor the soft delete extension that are broken, it's the data itself that is corrupted.
Given the example given by the OP: the good way to manage this is to implement a "on delete" behavior to manage the purchases: soft delete them as well? Prevent the user deletion? I don't know, it's a business logic decision: if the purchase cannot be deleted and you need to know who is the buyer, then you don't want to actually delete the data but implement a deactivation feature for the users.
As I am concerned, the whole point of using this extension is to avoid as much as possible to expose soft deleted data, which it won't help with anymore since we will have to use the good manager everywhere to do so.
Last thing: the fix this thread ended up with aims to fix an issue named by the thread title "SoftDelete is an anti-pattern". So we are trying to fix code written in the first place to implement an anti-pattern? The fix is simple, remove the whole thing from the extension or don't use it. I don't even get why something has been "fixed" in this extension for the sake of someone who consider this is an anti-pattern and said the only thing he does about this is to remove the feature from the project he's working on.
I agree with @mbegoc, I don't think soft-delete is an anti-pattern and the whole point of hiding the data in the queryset is to make sure you don't do any operations on it by accident. For example - send out an email notification, process them in some batch operation or include them in api response etc..
as I wasn't aware of this library at that time, I implemented a solution similar to what this offers and here's how I solved the issue that OP mentions in his argument "Breaks Django Admin (Objects are not visible even though they exist in the db)"
# manager.py
class AbstractBaseManager(Manager):
def get_queryset(self, *args, **kwargs):
return super().get_queryset(*args, **kwargs).filter(is_archived=False)
def archived(self, *args, **kwargs):
return super().get_queryset(*args, **kwargs).filter(is_archived=True)
# admin.py
class BaseAdmin(admin.ModelAdmin):
def get_queryset(self, request):
return super().get_queryset(request) | self.model.objects.archived()
This way super admin can see all the data in django admin panel even if they are archived or soft deleted