django-rest-framework icon indicating copy to clipboard operation
django-rest-framework copied to clipboard

Performance issue: N+1 queries and slow validation when using many=True with serializers containing relational fields

Open cheehong1030 opened this issue 1 year ago • 14 comments

I have some questions regarding the following code snippet:

model = PrimaryKeyRelatedField(
    pk_field=IntegerField(),
    queryset=MachineModel.objects.filter(active=True),
    many=True,
    required=True,
)

When PrimaryKeyRelatedField is set with many=True, it generates these queries. It seems like they are not using join.

SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 1) LIMIT 21; args=(1,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 2) LIMIT 21; args=(2,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 3) LIMIT 21; args=(3,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 4) LIMIT 21; args=(4,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 5) LIMIT 21; args=(5,); alias=default

Is this behavior a bug, or is it expected not to use join?

cheehong1030 avatar Dec 16 '24 03:12 cheehong1030

Are you prefeching the relationship in your view? You'll need to use Django's prefetch_related().

browniebroke avatar Dec 17 '24 08:12 browniebroke

Are you prefeching the relationship in your view? You'll need to use Django's prefetch_related().

This is for validation purposes, so I cannot use prefetch_related.

def put(self, request):
    serializer = MachinePartUpdateSerializer(data=request.data)
    serializer.is_valid()

Here is the serializer:

class MachinePartSerializer(ModelSerializer):
    brand = PrimaryKeyRelatedField(
        pk_field=IntegerField(),
        queryset=MachineBrand.objects.filter(active=True),
        required=True,
    )
    # TODO: performance issue n+1
    model = PrimaryKeyRelatedField(
        pk_field=IntegerField(),
        queryset=MachineModel.objects.filter(active=True),
        many=True,
        required=True,
    )

    class Meta:
        model = MachinePart
        fields = ["id", "brand", "model", "name", "description"]

Here is my model using a many-to-many relationship:

class MachinePart(Model):
    brand = ForeignKey(MachineBrand, on_delete=models.CASCADE)
    model = ManyToManyField(MachineModel, related_name="parts")
    name = CharField(max_length=255)
    part_number = CharField(max_length=255)
    description = TextField(blank=True, null=True)
    active = BooleanField(default=True)

So, how should I handle prefetch the relations

cheehong1030 avatar Dec 17 '24 09:12 cheehong1030

Ah ok, I thought you were doing a read-only request, I didn't realise it was a PUT request. I don't know the answer in this case.

browniebroke avatar Dec 17 '24 13:12 browniebroke

This might be a bug. It seems that the case of PrimaryKeyRelatedField with many=True wasn't considered.

cheehong1030 avatar Dec 17 '24 13:12 cheehong1030

The problem lies in the implementation of ManyrelatedField which is extremely simple and does not implement any kind of DB-level optimization. The same problem is in ListSerializer.create method. To keep those elements "simple" they just delegate DB operations to children, but this is not-optimal when there are many elements.

This is not technically a bug, because it does what it is expected, yet it is a bad usage of resources and should be addressed.

sevdog avatar Dec 23 '24 14:12 sevdog

Maybe this could open a PR to optimize it.

cheehong1030 avatar Dec 24 '24 03:12 cheehong1030

The behavior you're observing is due to how Django's PrimaryKeyRelatedField interacts with the database when many=True is used. Each item in the list of primary keys results in an individual query to fetch the corresponding MachineModel instance. This can lead to N+1 query problems, where N is the number of related objects being fetched. Validation Per Instance:

PrimaryKeyRelatedField : fetches each related object separately to ensure it exists and matches the given queryset. The queries you see are part of this validation process. No Bulk Query: By default, PrimaryKeyRelatedField does not perform a single bulk query to fetch all related objects at once.

sreedhar742 avatar Jan 05 '25 11:01 sreedhar742

So, is there no solution for this issue?

cheehong1030 avatar Jan 06 '25 00:01 cheehong1030

This may be handled like it is done in Django core MultipleChoiceField: build a "bulk" select and than check if every provided key is in the returned queryset:

https://github.com/django/django/blob/40d5516385448a73426aad396778f369a363eda9/django/forms/models.py#L1622-L1658

This is not an hard task, since an implementation is ready for use for the same use-case.

sevdog avatar Jan 07 '25 07:01 sevdog

We hit this today. Searching the repo I see an attempt was made a while back:

  • https://github.com/encode/django-rest-framework/issues/4917
  • https://github.com/encode/django-rest-framework/pull/5093
  • https://github.com/encode/django-rest-framework/pull/5150

And there's also a approach suggested here: https://github.com/encode/django-rest-framework/discussions/8919#discussion-5010915 which is to try a single self.child_relation.bulk_to_internal_value(data) call before falling back to the current

[
    self.child_relation.to_internal_value(item)
    for item in data
]

tomviner avatar Jan 09 '25 22:01 tomviner

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 19 '25 07:07 stale[bot]

I encountered another performance issue. When I use an EventSerializer with many=True to validate around 1,000 events, and the serializer includes two SlugRelatedFields and one PrimaryKeyRelatedField, the validation process takes around 17 seconds.

Each row triggers approximately 3 database queries, which leads to a severe N+1 query problem during validation.

cheehong1030 avatar Oct 17 '25 02:10 cheehong1030