Performance issue: N+1 queries and slow validation when using many=True with serializers containing relational fields
I have some questions regarding the following code snippet:
model = PrimaryKeyRelatedField(
pk_field=IntegerField(),
queryset=MachineModel.objects.filter(active=True),
many=True,
required=True,
)
When PrimaryKeyRelatedField is set with many=True, it generates these queries. It seems like they are not using join.
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 1) LIMIT 21; args=(1,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 2) LIMIT 21; args=(2,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 3) LIMIT 21; args=(3,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 4) LIMIT 21; args=(4,); alias=default
SELECT "api_application_machinemodel"."id", "api_application_machinemodel"."brand_id", "api_application_machinemodel"."name", "api_application_machinemodel"."description", "api_application_machinemodel"."active" FROM "api_application_machinemodel" WHERE ("api_application_machinemodel"."active" AND "api_application_machinemodel"."id" = 5) LIMIT 21; args=(5,); alias=default
Is this behavior a bug, or is it expected not to use join?
Are you prefeching the relationship in your view? You'll need to use Django's prefetch_related().
Are you prefeching the relationship in your view? You'll need to use Django's
prefetch_related().
This is for validation purposes, so I cannot use prefetch_related.
def put(self, request):
serializer = MachinePartUpdateSerializer(data=request.data)
serializer.is_valid()
Here is the serializer:
class MachinePartSerializer(ModelSerializer):
brand = PrimaryKeyRelatedField(
pk_field=IntegerField(),
queryset=MachineBrand.objects.filter(active=True),
required=True,
)
# TODO: performance issue n+1
model = PrimaryKeyRelatedField(
pk_field=IntegerField(),
queryset=MachineModel.objects.filter(active=True),
many=True,
required=True,
)
class Meta:
model = MachinePart
fields = ["id", "brand", "model", "name", "description"]
Here is my model using a many-to-many relationship:
class MachinePart(Model):
brand = ForeignKey(MachineBrand, on_delete=models.CASCADE)
model = ManyToManyField(MachineModel, related_name="parts")
name = CharField(max_length=255)
part_number = CharField(max_length=255)
description = TextField(blank=True, null=True)
active = BooleanField(default=True)
So, how should I handle prefetch the relations
Ah ok, I thought you were doing a read-only request, I didn't realise it was a PUT request. I don't know the answer in this case.
This might be a bug. It seems that the case of PrimaryKeyRelatedField with many=True wasn't considered.
The problem lies in the implementation of ManyrelatedField which is extremely simple and does not implement any kind of DB-level optimization. The same problem is in ListSerializer.create method. To keep those elements "simple" they just delegate DB operations to children, but this is not-optimal when there are many elements.
This is not technically a bug, because it does what it is expected, yet it is a bad usage of resources and should be addressed.
Maybe this could open a PR to optimize it.
The behavior you're observing is due to how Django's PrimaryKeyRelatedField interacts with the database when many=True is used. Each item in the list of primary keys results in an individual query to fetch the corresponding MachineModel instance. This can lead to N+1 query problems, where N is the number of related objects being fetched. Validation Per Instance:
PrimaryKeyRelatedField : fetches each related object separately to ensure it exists and matches the given queryset. The queries you see are part of this validation process. No Bulk Query: By default, PrimaryKeyRelatedField does not perform a single bulk query to fetch all related objects at once.
So, is there no solution for this issue?
This may be handled like it is done in Django core MultipleChoiceField: build a "bulk" select and than check if every provided key is in the returned queryset:
https://github.com/django/django/blob/40d5516385448a73426aad396778f369a363eda9/django/forms/models.py#L1622-L1658
This is not an hard task, since an implementation is ready for use for the same use-case.
We hit this today. Searching the repo I see an attempt was made a while back:
- https://github.com/encode/django-rest-framework/issues/4917
- https://github.com/encode/django-rest-framework/pull/5093
- https://github.com/encode/django-rest-framework/pull/5150
And there's also a approach suggested here: https://github.com/encode/django-rest-framework/discussions/8919#discussion-5010915 which is to try a single self.child_relation.bulk_to_internal_value(data) call before falling back to the current
[
self.child_relation.to_internal_value(item)
for item in data
]
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I encountered another performance issue. When I use an EventSerializer with many=True to validate around 1,000 events, and the serializer includes two SlugRelatedFields and one PrimaryKeyRelatedField, the validation process takes around 17 seconds.
Each row triggers approximately 3 database queries, which leads to a severe N+1 query problem during validation.