loopback-next
loopback-next copied to clipboard
Query performance issue in inclusion resolver for hasManyThrough
Describe the bug
Similar to #8074 , the hasManyThrough inclusion resolver suffers from a heavily degraded performance. It will fetch all through models for a given set of source models using a single query, but afterwards it will perform a query for each source model (or set of through models). For example, when querying 100 Item
s and including its Category
has-many-through relation, it will perform 100 Category
queries instead of just 1.
Adding a layer of has-many relations, e.g. also including all Item
s for all Category
s will have an exponential effect: 100 Item
s in 10 Category
s will result in 1000 queries, instead of 2.
In the linked branch hasManyThroughPerformanceIssue
, the TodoList example has been modified with an added Category
model, set up with a bi-directional hasManyThrough relation Todo
<-> Category
<-> Todo
. There's a test added which includes todos with categories with todos: it should execute 3 queries, but fails (see npm test
). After applying the patch from the PR, it succeeds.
Logs
No response
Additional information
My proposal (PR included) is to:
- collect all target model keys from the through models
- perform a single
findByForeignKeys
call - do some magic to adhere to scope limit and order
- "rearrange" the result in the proper format and return
Possible issues:
- scope magic (see PR)
Notes:
- Doesn't fix the "issue" of re-querying an
Item
in an inclusion of depth 3 with bi-directional hasManyThrough (e.g., when findingItem
with relationCategory
with relationItem
, allItems
will be queried twice)
Anecdotal result: a single .find
with a bunch of included relations went from 295 queries in 421 ms to 7 queries in 101 ms.
Reproduction
https://github.com/0x0aNL/loopback-next/tree/hasManyThroughPerformanceIssue
Following this one, same issue for me. For comparison, the lb3 executing the same query takes about 150ms, while lb4 takes an average of 2s.
My case is then more complex than this because involves datasource providers, and for each relation resolved, it create a datasource instance. This grow exponentially with nested relations.