django-shared-schema-tenants
django-shared-schema-tenants copied to clipboard
Improve queries with related attributes to use tenant index
When we do filters with related fields, the database is making queries considering the whole related table, instead of only the portion of data from the current tenant. This makes queries very slow when there're many tenants.
One solution is implicitly changing the queryset to filter by the related table tenant too.
Eg.:
MyModel.objects.filter(my_related_field__field1=value1, my_related_field__field2=value2)
sql is something like:
SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
INNER JOIN "my_app_myrelatedmodel"
ON "my_app_myrelatedmodel"."mymodel_id" = "my_app_mymodel"."id"
WHERE
"my_app_mymodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."field1" = value1 AND
"my_app_myrelatedmodel"."field2" = value2
If we add the related field tenant's filter like this we solve the problem
MyModel.objects.filter(
my_related_field__tenant_id=tenant_id, my_related_field__field1=value1,
my_related_field__field2=value2)
the sql with the fix should something like:
SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
INNER JOIN "my_app_myrelatedmodel"
ON "my_app_myrelatedmodel"."mymodel_id" = "my_app_mymodel"."id"
WHERE
"my_app_mymodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."field1" = value1 AND
"my_app_myrelatedmodel"."field2" = value2
For excludes the situation is even worse because it always uses NOT IN
operator and subqueries.
Eg.:
MyModel.objects.exclude(my_related_field__field1=value1, my_related_field__field2=value2)
sql is something like:
SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
"my_app_mymodel"."tenant_id" = tenant_id
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE "my_app_myrelatedmodel"."field1" = value1
) AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE "my_app_myrelatedmodel"."field2" = value2
);
If we use the same strategy as in filter the result would be something like
SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
"my_app_mymodel"."tenant_id" = tenant_id AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."tenant_id" = tenant_id
) AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."field1" = value1
) AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."field2" = value2
);
For it to be optimized, the generated code should be something like this:
SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
"my_app_mymodel"."tenant_id" = tenant_id AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."field1" = value1
) AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."field2" = value2
);
The only way I could generate something that looks like the expected result was:
MyModel.objects.exclude(
my_related__id__in=MyRelatedModel.objects.filter(field__field1=value1, field2=value2))
The resulting sql looks like this:
SELECT "my_app_mymodel"."field1", "my_app_mymodel"."field2"
FROM "my_app_mymodel"
WHERE
"my_app_mymodel"."tenant_id" = tenant_id AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."field1" = value1
) AND
"my_app_mymodel"."my_related_field_id" NOT IN (
SELECT "my_app_myrelatedmodel"."field1", "my_app_myrelatedmodel"."field2"
FROM "my_app_myrelatedmodel"
WHERE "my_app_myrelatedmodel"."id" IN (
SELECT "my_app_myrelatedmodel"."id"
FROM "my_app_myrelatedmodel"
WHERE
"my_app_myrelatedmodel"."tenant_id" = tenant_id AND
"my_app_myrelatedmodel"."field2" = value2
)
);
I might have some ideas how to get these queries to come out more optimised, but I'm not 100% sure if I've followed your reasoning correctly.
This definitely seems like something where a concrete example case would be very helpful. If I had a concrete example I could import in a python REPL and work with I could spend some time in the REPL trying the various Query tools in Django (Aggregation
functions, Q
objects, Prefetch
objects, FilteredRelation
objects, etc...) to try and find a way to optimise the query. However if I just made up some model classes to try things I won't be sure if I've created an appropriate example and really found a way to fix these query optimisation issues, or if my example is too simplistic and my possible improvements don't fully solve the problems you're talking about here.
@hugobessa Could you add one or two examples that produce this kind of non-optimal queries from the ORM into the included exampleproject
Django project, and/or into the test cases for future use in regression testing the optimisations?
@hugobessa Its just occurred to me that any work to improve this may have conflicts with the work to make the Tennant model swappable and solve #37