Separate preloading joins from filtering joins
When doing something like:
em.find(Author, { books: { title: "b1" } }, { populate: "comments" });
The generated query has two joins:
- a JOIN into
booksfor evaling the title=b1, and - a JOIN into
commentsfor doing the preload
However, this means that (unless the postgres query planner is smart enough to avoid it) we're calculating the preload/rollup of "every author's list of comments" for all authors rows, even the ones that won't end up matching the WHERE title=b1 predicate.
Instead, we should only do the preloading for matched rows, i.e. something like:
select
a.*, ...preload columns...
from (
select a.* from ...filtering joins...
)
...preload joins...
This current query structure isn't supported as-in in the ParsedFindQuery AST, b/c we'll need an "inner table" type, unless maybe we use a CTE? 🤔
I've benchmarked a "current query" vs. "proposed query" in our production database and found similar execution times & EXPLAINs, so maybe this is not nece
-- current query, both joins in one query
select * from item_template_items iti
cross join lateral (
select json_agg(json_build_object("id", pi.id)) from project_items pi
where pi.item_template_item_id = iti.id
)
where iti.id > 10000 limit 2000;
-- proposed query, move predicate to an inner query
select * from (
select * from item_template_items iti where iti.id > 10000 limit 2000
) iti
cross join lateral (
select json_agg(json_build_object("id", pi.id)) from project_items pi
where pi.item_template_item_id = iti.id
);
Currently our preloading looks like:
-- select authors & preload a.books and a.comments
select b._ as b, c._ as c, a.*
from authors a
-- join into books > reviews
cross join lateral (
select json_agg(_._) as _
from (
select json_build_array(b.id, b.title, b."order", b.created_at, b.updated_at, br._) as _
from books b
-- join into reviews
cross join lateral (
select json_agg(_._) as _
from (
select json_build_array(br.id, br.rating, br.is_public, br.created_at, br.updated_at) as _
from book_reviews br where br.book_id = b.id
) _
) as br
where a.id = b.author_id
) as _
) b
-- join into comments
cross join lateral (
select json_agg(_._) as _
from (
select json_build_array(c.id, c.parent_author_id, c.parent_book_id, c.parent_book_review_id, c.text, c.updated_at, c.created_at) as _
from comments c
where a.id = c.parent_author_id
) as _
) c
where
a.id = 1;
We should try and do this instead:
select
json_build_object('a', row_to_json(a), 'c', c.c, 'b', b.b)
from authors a
cross join lateral (
select json_agg(json_build_object('br', br.br, 'b', row_to_json(b))) as b
from books b
cross join lateral (
select json_agg(json_build_object('br', row_to_json(br))) as br from book_reviews br where br.book_id = b.id
) as br
where a.id = b.author_id
) b
cross join lateral (
select json_agg(row_to_json(c)) as c from comments c where a.id = c.parent_author_id
) c
where
a.id = 1;
Because we could pass the row_to_jsons directly as rows without any extra params mixed in.
It's also just a simpler query, b/c we don't repeating all the columns in json_build_arrays.
Granted, it means more JSON keys over the wire. 🤔