[5.x]: Relationship queries with targetElement are highly inefficient
What happened?
Description
We've had persistent performance issues on a particular site and identified one particular query as the culprit. The query is used to find related entries. Related entries in this case means that we want to find entries that have all the same entries selected across multiple Entries fields.
We found that the query is super slow and gets exponentially slower with the amount of selected entries. However, the issue only occurs when we explicitly specify the direction of the relation using targetElement instead of just passing the element IDs.
I boiled it down to a minimal example. Consider the following queries:
{% set related_category_ids = [
1719,
1699,
1746,
1747,
317819,
1750,
1722,
317921,
1724,
317947,
] %}
{# Slow, but acceptable (~300 ms) #}
{% set related_entries = craft.entries()
.relatedTo(['and', ...related_category_ids])
.limit(4)
.all()
%}
{# Exceedingly slow (~30 s) #}
{% set related_entries = craft.entries()
.relatedTo({ targetElement: ['and', ...related_category_ids] })
.limit(4)
.all()
%}
{# Also exceedingly slow (~30 s) #}
{% set related_entries_query = craft.entries().limit(4) %}
{% for category_id in related_category_ids %}
{% do related_entries_query.andRelatedTo({ targetElement: category_id}) %}
{% endfor %}
{% set related_entries = related_entries_query.all() %}
Here are the raw MySQL queries generated for the three examples:
To summarize: Using targetElement instead of leaving the direction open causes the query to be two orders of magnitude slower.
I'll be honest, I don't fully understand why this is the case. Something about the execution plan with the nested subquery.
In any case, I think the query builder should be able to generate a more efficient query here. In theory, looking just at the targetId instead of both targetId and sourceId should take less work. But something about the way the queries are converted to SQL causes an issue with the execution plan here.
Happy to be told if I'm doing something wrong!
Craft CMS version
5.8.18
PHP version
8.3
Operating system and version
No response
Database type and version
MySQL 8.0.40
Image driver and version
No response
Installed plugins and versions
I would expect the second query to be faster than the first, since it’s checking for relationships going both directions instead of just one.
The main issue is that you’re joining with and though, which results in a much more complex query than or, since each individual ID you pass will result in a new join, and a new condition on the query.
@brandonkelly
I would expect the second query to be faster than the first, since it’s checking for relationships going both directions instead of just one.
I would have expected that too! But I'm seeing the opposite. I don't think it's an inherent result of the query's complexity. Rather, a side effect of how the queries are generated by the query builder, which results in suboptimal query execution plans in some cases.
The main issue is that you’re joining with and though, which results in a much more complex query than or, since each individual ID you pass will result in a new join, and a new condition on the query.
Yes, unfortunately that is the business requirement in this case. We need to find matching entries which share all the given entries' relations across several fields. Using OR would be way too unspecific.
Searching for relations with AND shouldn't be a performance trap IMO. I think the problem is that the generated query uses a separate join for every individual element ID. Couldn't it use a single join and then add multiple WHERE conditions on that? I think that would solve the performance issue here.
@MoritzLost Did you try running db/repair? Total shot in the dark, but I've ran into a lot of weird query performance issues over the last few years that were solved by recreating indexes.
@mmikkel Thanks for the suggestion! Just gave that a try, unfortunately it didn't help. The query is slightly faster, but not meaningfully (something like 30s to 25s). I really think it's caused by the redundant joins created by Craft for the related to query.