django-postgres-extra icon indicating copy to clipboard operation
django-postgres-extra copied to clipboard

Upsert and the semantics of field.pre_save()

Open a3kov opened this issue 3 years ago • 0 comments

Introduction

Django's field.pre_save() allows fields to auto-provide 2 different values: 1 for insert, and 1 for update. There is no requirement of what these values are: they could be same or 2 absolutely different arbitrary values. These fields are called "magical" in the project code.

Postgres upsert query type only takes 1 (insert) value for every field, for updates it can only reference insert values. There are some workarounds but they are not so pretty

Built-in date fields that have different pre_save() values can be upserted, because for auto_now it's always the same value. and for auto_now_add update is a no-op, which means it also reduced to 1 value for insert. For some other fields it could be impossible. The semantics of pre_save() does not dictate what the values are.

That means that Postgres upsert can't completely satisfy the semantics of Django's field.pre_save() and there are fields that can't be correctly upsert-ed

For every field, where current value (if existing) is C, pre_save(add=True) = A, and pre_save(add=False) = B, if C != B and A != B, Postgres upsert is impossible in the current form.

Takeaways

  1. The current implementation can be considered buggy, as it will just silently use insert values for updates.
  2. We could implement the ugly workaround linked above. I really don't like this idea as the resulting query is very confusing and totally non-obvious. But at least it could be a (universal) working solution.
  3. If we could check for non-upsertable data and throw an exception, warning the caller, it would be best. However, while we know A and B from the description above, we don't know C unless we run an additional SELECT which would defeat the purpose of upsert.
  4. I think update_fields argument for upsert makes total sense: we can just ask the caller what fields are supposed to be updated. Not only it's good for partial updates in general, but also it would help with the situation. If we document the issue, the programmer can simply avoid including the problematic fields in the update_fields list. Then it would be the programmer's duty to update these fields in a separete bulk_update(). Such API change would require a major version bump though.
  5. I haven't dug deep, but I expect that the "magical fields" logic after the update_fields implementation won't be needed anymore: insert fields will simply be all model's concrete fields, and update fields will be provided. I don't expect there will be an issue, but please correct me if I'm wrong.
  6. We could just document that for updates, some field data could be incorrect, but then it doesn't look good for the user of this library.

a3kov avatar Apr 15 '21 01:04 a3kov