django icon indicating copy to clipboard operation
django copied to clipboard

Fixed #373 -- Added CompositePrimaryKey.

Open csirmazbendeguz opened this issue 10 months ago • 43 comments

Trac ticket number

ticket-373

Branch description

This branch adds the CompositePrimaryKey field. If present, Django will create a composite primary key.

Please refer to the docs for a more in-depth explanation.

Proposal Previous PR Composite FK Admin Composite Generic FK ~~Model._is_pk_set()~~ ✅ ~~Tuple Lookups~~ ✅ ~~Serial Fields~~ ❌

class Tenant(models.Model):
    pass


class User(models.Model):
    pk = models.CompositePrimaryKey("tenant_id", "id", primary_key=True)
    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
    id = models.IntegerField()


class Comment(models.Model):
    pk = models.CompositePrimaryKey("tenant_id", "id")
    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
    id = models.IntegerField()
    user_id = models.IntegerField()
    user = models.ForeignObject(
        User,
        on_delete=models.CASCADE,
        from_fields=("tenant_id", "user_id"),
        to_fields=("tenant_id", "id"),
        related_name="+",
    )

Checklist

  • [X] This PR targets the main branch.
  • [X] The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • [X] I have checked the "Has patch" ticket flag in the Trac system.
  • [X] I have added or updated relevant tests.
  • [x] I have added or updated relevant docs, including release notes if applicable.
  • [x] For UI changes, I have attached screenshots in both light and dark modes.

csirmazbendeguz avatar Apr 07 '24 04:04 csirmazbendeguz

I was trying out this exciting branch and ran into this error when running a test:

<...>/lib/python3.12/site-packages/django/db/models/lookups.py:30: in __init__
    self.rhs = self.get_prep_lookup()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TupleIn(<django.db.models.fields.composite.Cols object at 0x107560980>, <django.db.models.sql.query.Query object at 0x1074e23f0>)

    def get_prep_lookup(self):
        if not isinstance(self.lhs, Cols):
            raise ValueError(
                "The left-hand side of the 'in' lookup must be an instance of Cols"
            )
        if not isinstance(self.rhs, Iterable):
>           raise ValueError(
                "The right-hand side of the 'in' lookup must be an iterable"
            )
E           ValueError: The right-hand side of the 'in' lookup must be an iterable

The issue stems from the use of isnull like so:

MyModel.objects.filter(
    type_override__severity__isnull=False
).update(severity="high")

Curious if anyone ran into this as well.

Edited for traceback:

<...>
lib/python3.12/site-packages/django/db/models/sql/compiler.py:2080: in pre_sql_setup
    self.query.add_filter("pk__in", query)
lib/python3.12/site-packages/django/db/models/sql/query.py:1601: in add_filter
    self.add_q(Q((filter_lhs, filter_rhs)))
lib/python3.12/site-packages/django/db/models/sql/query.py:1617: in add_q
    clause, _ = self._add_q(q_object, self.used_aliases)
lib/python3.12/site-packages/django/db/models/sql/query.py:1649: in _add_q
    child_clause, needed_inner = self.build_filter(
lib/python3.12/site-packages/django/db/models/sql/query.py:1563: in build_filter
    condition = self.build_lookup(lookups, col, value)
lib/python3.12/site-packages/django/db/models/sql/query.py:1393: in build_lookup
    lookup = lookup_class(lhs, rhs)
lib/python3.12/site-packages/django/db/models/lookups.py:30: in __init__
    self.rhs = self.get_prep_lookup()

So, this is part of SQLUpdateCompiler and is coming from the update code path.

grjones avatar Apr 17 '24 19:04 grjones

Thanks for testing and reporting the issue @grjones! Indeed, I forgot to handle this use case. I'll look into it this week.

csirmazbendeguz avatar Apr 18 '24 02:04 csirmazbendeguz

@grjones, FYI I pushed the fix

csirmazbendeguz avatar Apr 20 '24 00:04 csirmazbendeguz

@grjones, FYI I pushed the fix

Nice! I hope this gets merged in soon. Your branch has been working great for me.

grjones avatar Apr 20 '24 01:04 grjones

I may have found one other small issue. When adding a regular primary_key=True on a single field, a unique constraint is added. But when using this branch, it becomes an IntegrityError instead. Adding a UniqueConstraint on the composite fields is a work-a-round but ideally would be captured in this PR. Imo, this PR is sooooo close. I'm excited for it to be merged in.

grjones avatar Apr 22 '24 16:04 grjones

@grjones , thanks, I appreciate the feedback, I'll look into it. If a model defines Meta.primary_key, defining primary_key=True on a field should not be possible - could you give me a code example so I know how to reproduce the issue? I didn't know Django added unique constraints to primary keys, I'll check, but isn't that redundant?

csirmazbendeguz avatar Apr 23 '24 04:04 csirmazbendeguz

@grjones , thanks, I appreciate the feedback, I'll look into it. If a model defines Meta.primary_key, defining primary_key=True on a field should not be possible - could you give me a code example so I know how to reproduce the issue? I didn't know Django added unique constraints to primary keys, I'll check, but isn't that redundant?

I'll see if I can give you a solid failing test. My "unique constraint" phrasing might not be exactly right. But ultimately, I believe Django queries the DB first to see if the new object's PK already exists and throws a validation error. The composite key logic doesn't seem to be doing that and so an unhandled IntegrityError is raised instead.

grjones avatar Apr 23 '24 05:04 grjones

@grjones , sorry for the late reply, I've been busy last week. Could you give me more specifics? What's the error message you expect?

csirmazbendeguz avatar May 01 '24 08:05 csirmazbendeguz

@grjones , sorry for the late reply, I've been busy last week. Could you give me more specifics? What's the error message you expect?

Actually, I think it's mostly ok. I was using Django Spanner and it's just not quite working with composite keys and will need to be fixed there. I wrote this and it passed. It probably shouldn't say Id though?

from django.core.exceptions import ValidationError
from django.test import TestCase

from .models import Tenant, User


class CompositePKCleanTests(TestCase):
    """
    Test the .clean() method of composite_pk models.
    """

    @classmethod
    def setUpTestData(cls):
        cls.tenant = Tenant.objects.create()

    def test_validation_error_is_raised_when_pk_already_exists(self):
        test_cases = [
            {"tenant": self.tenant, "id": 2412, "email": "[email protected]"},
            {"tenant_id": self.tenant.id, "id": 5316, "email": "[email protected]"},
            {"pk": (self.tenant.id, 7424), "email": "[email protected]"},
        ]
        expected = "{'id': ['User with this Id already exists.']}"
        for fields in test_cases:
            User.objects.create(**fields)
            with self.assertRaisesMessage(ValidationError, expected):
                User(**fields).clean()

grjones avatar May 02 '24 03:05 grjones

Thank you so much for taking the time to review my changes @LilyFoote ! I have two questions:

  1. If Meta.primary_key is defined, this PR will automatically add a composite field called primary_key to the model. What do you think about this approach? I felt like it was easier to handle the composite primary keys this way as we can run checks against the meta class instead of traversing the model's fields for a composite field.
  2. I wrote a lot of tests testing the underlying queries made by the ORM. It makes a lot of sense to me, but I haven't seen this type of tests that much in the Django source code - do these tests look okay to you?

csirmazbendeguz avatar May 04 '24 03:05 csirmazbendeguz

If Meta.primary_key is defined, this PR will automatically add a composite field called primary_key to the model. What do you think about this approach?

I don't feel strongly that this is better or worse than another option here, so happy to go with what you think is best.

I wrote a lot of tests testing the underlying queries made by the ORM. It makes a lot of sense to me, but I haven't seen this type of tests that much in the Django source code - do these tests look okay to you?

I like your tests quite a bit - they're pretty readable and comprehensive. The main issue I have with them is that they're written for specific databases instead of for generic database features. Where possible Django strongly prefers to test based on features because then the tests apply to as many databases as possible (including third party database libraries). I think the asserts of the actual SQL might be a bit tricky to adapt though, so we might need a different way to check what they're checking.

Also, after I reviewed yesterday, I thought of some more things:

  • We should add migrations tests to make sure that adding/removing Meta.primary_key works correctly and that removing a field that's part of a primary key also does something appropriate.
  • We might want tests for composite keys in forms and the admin. Maybe there's other areas too that we need to check the interactions.

LilyFoote avatar May 04 '24 12:05 LilyFoote

Thanks @charettes !

Something that came through my mind while reviewing is that we likely want a plan to eventually deprecate Options.pk in favor of Options.primary_key?

I'm not sure what you mean by that, I don't think we can, because Options.pk refers to the field, while Options.primary_key is the list of field names.

csirmazbendeguz avatar May 17 '24 04:05 csirmazbendeguz

So as far as I understand, at the moment MultiColSource is used by Django internally to represent JOINs on multiple fields - that's why it has a sources field.

I'm not sure it's the right decision to reuse this for composite fields, which on the other hand don't need sources, it just needs to represent a list of Cols as an expression.

Let me know what you think!

csirmazbendeguz avatar May 17 '24 05:05 csirmazbendeguz

I'm not sure what you mean by that, I don't think we can, because Options.pk refers to the field, while Options.primary_key is the list of field names.

You're completely right. In this case is pk set to CompositePrimaryKey when Meta.primary_key is defined and is primary_key set when a non-composite primary is used as well?

charettes avatar May 17 '24 13:05 charettes

I'm not sure what you mean by that, I don't think we can, because Options.pk refers to the field, while Options.primary_key is the list of field names.

You're completely right. In this case is pk set to CompositePrimaryKey when Meta.primary_key is defined and is primary_key set when a non-composite primary is used as well?

It would not be set, if it's a regular primary key, Meta.primary_key is None.

csirmazbendeguz avatar May 20 '24 16:05 csirmazbendeguz

Hey @csirmazbendeguz, thank you for the amazing work out there! I was trying to test this branch on my local with SQLite and realised a few things:

  1. If you run makemigrations for a model with a CompositePrimaryKey, the resulting migration file has erroneous imports. To fix this, I believe we need to add django.db.models.fields.composite path to the if...elif block here.

  2. Assume that I have the following models:

    class Author(models.Model):
    name = models.CharField(max_length=100)
    
    class Book(models.Model):
        id = models.CompositePrimaryKey("author", "title")
        author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="books")
        title = models.CharField(max_length=255)
    

    With the current implementation, following test fails:

    class TestCompositeFks(TestCase):
        def test_composite_fks(self):
            author = Author.objects.create(name="Author")
            book = Book.objects.create(author=author, title="Title")
            list(Author.objects.filter(books__in=[book])) == book
    

    with an OperationalError, caused by a syntax error. Executed SQL is as following:

    SELECT
        "books_author"."id",
        "books_author"."name"
    FROM
        "books_author"
        INNER JOIN "books_book" ON ("books_author"."id" = "books_book"."author_id")
    WHERE
        "books_book"."author_id", "books_book"."title" IN ((1, 'Title'))
    

    because LHS in WHERE clause should have been wrapped with parantheses like this:

    ...
    WHERE
        ("books_book"."author_id", "books_book"."title") IN ((1, 'Title'))
    

    Unfortunately I didn't have a time to deep-dive to this.

  3. Not a big issue but my code editor (VSCode) does not recognize models.CompositePrimaryKey, although the import is working fine. This is probably related with Pylance or something that VSCode uses to recognize fields under models module.

Again thanks for this amazing initiative! 🚀

omerfarukabaci avatar Jun 10 '24 16:06 omerfarukabaci

Thanks a lot for the review @omerfarukabaci ! I'll take a look

csirmazbendeguz avatar Jun 10 '24 17:06 csirmazbendeguz

Author.objects.filter(books__in=[book])

@omerfarukabaci , I pushed the changes to support this, but note that filtering on reverse relations is one of those "gotchas" in Django, it may not produce the results you expect.

EDIT: I mean it might return duplicates, you probably already know this, I'm just mentioning it just in case.

csirmazbendeguz avatar Jun 11 '24 06:06 csirmazbendeguz

If you run makemigrations for a model with a CompositePrimaryKey, the resulting migration file has erroneous imports

Yes, I recently changed the API to CompositePrimaryKey, the migrations are not 100% yet. I'm working on sorting them out. I pushed the fix for the issue you mentioned, thanks 👍

csirmazbendeguz avatar Jun 11 '24 09:06 csirmazbendeguz

@csirmazbendeguz Thanks for your answers, now the above issues seem like fixed, created migration is correct and reverse relation lookup is working as expected. Thank you! 🚀

While I was testing it further with the exact same models, I realized another issue:

class TestCompositeFks(TestCase):
    def test_composite_fks(self):
        author = Author.objects.create(name="Author")
        Book.objects.create(author=author, title="Title")
        author = Author.objects.annotate(book_count=Count("books")).get()
        assert author.book_count == 1

This test fails with the following error:

django.db.utils.OperationalError: wrong number of arguments to function COUNT()

Executed SQL is as following:

SELECT
    "books_author"."id",
    "books_author"."name",
    COUNT("books_book"."author_id", "books_book"."title") AS "book_count"
FROM
    "books_author"
    LEFT OUTER JOIN "books_book" ON ("books_author"."id" = "books_book"."author_id")
GROUP BY
    "books_author"."id",
    "books_author"."name"

If we could change the parameter we pass to the COUNT function to a concatenation as below:

COUNT("books_book"."author_id" || '-' || "books_book"."title")

it should work fine (if I am not missing something), with the exception that for some databases we need to use CONCAT function instead of || operator, which might be resolved using the existing db.models.functions.Concat function.

Note: I am not sure if concatenation works between every data type that is allowed to be a primary key, although this could be considered as an edge case.

omerfarukabaci avatar Jun 11 '24 14:06 omerfarukabaci

Thanks @omerfarukabaci , these bug reports are very helpful. Yes, I haven't considered annotations with multi-column pks. I'll look into this.

csirmazbendeguz avatar Jun 11 '24 16:06 csirmazbendeguz

Thanks @LilyFoote , @sarahboyce for the meeting. Notes:

  1. At the moment, Count("books") is not too easy to fix. It could be written as Count("books__author_id") and it would work. If we cannot fix it, we should document it. Maybe we could use * instead of pk for counting?
  2. The content types framework will not work with composite pks. This includes everything that depends on the content types framework, e.g. the contrib.auth module too.

csirmazbendeguz avatar Jun 13 '24 17:06 csirmazbendeguz

I squsahed all commits and rebased to latest main branch.

csirmazbendeguz avatar Jun 18 '24 14:06 csirmazbendeguz

Idea: how about calling the field PrimaryKey instead of CompositePrimaryKey?

csirmazbendeguz avatar Jun 18 '24 17:06 csirmazbendeguz

@omerfarukabaci , I thought about the issue of Count("books").

My conclusion is we can't support this.

I don't think concatenating is a good solution. The only way we could support this is if we could get Django to count this with * instead of the primary key.

This is an edge case that is only needed for Count though, and it's not as simple to implement as it is to explain.

I added a section to the docs about this. This is a case of using a database function with a composite primary key directly, which cannot be expected to work in general.

In your case, Count("books__author_id") would do the trick instead.

csirmazbendeguz avatar Jun 19 '24 08:06 csirmazbendeguz

Regarding the issue raised by @sarahboyce last week...

I think it is okay to merge this without support for generic relations. I added a section to the docs about this not being supported for now.

The only impact is some third-party packages using generic relations won't work with composite primary keys (e.g. django-guardian).

Let's have a separate discussion on how to support this. I lean towards storing composite primary keys serialized as JSON in a single CharField.

csirmazbendeguz avatar Jun 19 '24 08:06 csirmazbendeguz

Btw, semantically it would be nice if it were possible to write:

class User(models.Model):
    pk = models.CompositePrimaryKey("tenant_id", "id")
    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
    id = models.IntegerField()

ie to let CompositePrimaryKey replace the automatically generated pk. Would that be possible?

apollo13 avatar Jun 25 '24 18:06 apollo13

Btw, semantically it would be nice if it were possible to write:

class User(models.Model):
    pk = models.CompositePrimaryKey("tenant_id", "id")
    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
    id = models.IntegerField()

ie to let CompositePrimaryKey replace the automatically generated pk. Would that be possible?

@apollo13 , good point! It also came up when we were discussing this with @LilyFoote and @charettes . It seems like a natural thing to do, so it's worth a discussion. Here are a couple ideas that make sense to me:

  1. pk at the moment is reserved, users can't add a field named pk. We could remove this restriction.
  2. If pk is defined, it should always set primary_key=True.
  3. If pk is not defined, it should still refer to the primary_key=True field (e.g. id field). This is required for backwards-compatibility.
  4. If pk is defined, and it's an IntegerField, then a field called pk should be created in the database (same as any field, e.g. id).
  5. If pk is defined, and it's a CompositePrimaryKey, then a field called pk shouldn't be created in the database (same as any field, e.g. primary_key).

My only issue with this is, it adds extra complexity to how pk works. In this case, pk can be both a reference to the primary key field, or the primary key field itself.

So I'm not sure if it's worth doing this. It doesn't feel like an elegant or consistent solution to me.


The other approach @charettes and @LilyFoote mentioned is to always have pk be a CompositePrimaryKey (could be renamed to PrimaryKey):

  1. pk cannot be defined explicitly.
  2. CompositePrimaryKey cannot be used explicitly.
  3. pk is always added to the model in the background, and it's always an instance of CompositePrimaryKey.
  4. Consequently, pk will cease to be a reference to another field, it will always be a field itself.
  5. If field x defines primary_key=True, pk is CompositePrimaryKey("x"). obj.pk returns the value of x for backwards-compatibility (instead of a tuple).
  6. If Meta.primary_key option is ("a", "b", "c"), pk is CompositePrimaryKey("a", "b", "c"). obj.pk returns a tuple.
  7. If Meta.primary_key is not set, it could be set to ("x",) automatically.

This is quite an invasive change. It would mean all existing models get a new field called pk. meta.pk would return a different field. Instead of IntegerField, it would return CompositePrimaryKey. Is breaking backwards-compatibility okay here?

I don't have anything against it other than that. It does feel more intuitive. If the community wants this, I could fork this branch and open another PR.

csirmazbendeguz avatar Jun 26 '24 07:06 csirmazbendeguz

Today's meeting with @LilyFoote and @charettes :

  • enable setting pk to CompositePrimaryKey explicitly. pk is a well-known Django keyword, so it makes sense to reuse it here, even if it somewhat complicates things.
  • pk can only be set to CompositePrimaryKey.
  • CompositePrimaryKey can't be set to anything else but pk.

csirmazbendeguz avatar Jul 04 '24 13:07 csirmazbendeguz

@apollo13 , I discussed your suggestion with @LilyFoote and @charettes and they agreed. I pushed the changes. The only way to define a composite pk is with the pk field name now.

csirmazbendeguz avatar Jul 05 '24 12:07 csirmazbendeguz