joist-orm icon indicating copy to clipboard operation
joist-orm copied to clipboard

Separate preloading joins from filtering joins

Open stephenh opened this issue 8 months ago • 2 comments

When doing something like:

em.find(Author, { books: { title: "b1" } }, { populate: "comments" });

The generated query has two joins:

  • a JOIN into books for evaling the title=b1, and
  • a JOIN into comments for doing the preload

However, this means that (unless the postgres query planner is smart enough to avoid it) we're calculating the preload/rollup of "every author's list of comments" for all authors rows, even the ones that won't end up matching the WHERE title=b1 predicate.

Instead, we should only do the preloading for matched rows, i.e. something like:

select
  a.*, ...preload columns...
  from (
    select a.* from ...filtering joins...
  )
  ...preload joins...

This current query structure isn't supported as-in in the ParsedFindQuery AST, b/c we'll need an "inner table" type, unless maybe we use a CTE? 🤔


I've benchmarked a "current query" vs. "proposed query" in our production database and found similar execution times & EXPLAINs, so maybe this is not nece

-- current query, both joins in one query
select * from item_template_items iti
cross join lateral (
  select json_agg(json_build_object("id", pi.id)) from project_items pi
  where pi.item_template_item_id = iti.id
)
where iti.id > 10000 limit 2000;

-- proposed query, move predicate to an inner query
select * from (
  select * from item_template_items iti where iti.id > 10000 limit 2000
) iti
cross join lateral (
  select json_agg(json_build_object("id", pi.id)) from project_items pi
  where pi.item_template_item_id = iti.id
);

stephenh avatar Apr 27 '25 19:04 stephenh

Currently our preloading looks like:

-- select authors & preload a.books and a.comments
select b._ as b, c._ as c, a.*
from authors a
  -- join into books > reviews
  cross join lateral (
    select json_agg(_._) as _
    from (
      select json_build_array(b.id, b.title, b."order", b.created_at, b.updated_at, br._) as _
      from books b
      -- join into reviews
      cross join lateral (
        select json_agg(_._) as _
        from (
          select json_build_array(br.id, br.rating, br.is_public, br.created_at, br.updated_at) as _
          from book_reviews br where br.book_id = b.id
        ) _
      ) as br
      where a.id = b.author_id
    ) as _
  ) b
  -- join into comments
  cross join lateral (
    select json_agg(_._) as _
    from (
      select json_build_array(c.id, c.parent_author_id, c.parent_book_id, c.parent_book_review_id, c.text, c.updated_at, c.created_at) as _
      from comments c
      where a.id = c.parent_author_id
    ) as _
  ) c
where
  a.id = 1;

stephenh avatar Apr 27 '25 22:04 stephenh

We should try and do this instead:


select
  json_build_object('a', row_to_json(a), 'c', c.c, 'b', b.b)
from authors a
  cross join lateral (
    select json_agg(json_build_object('br', br.br, 'b', row_to_json(b))) as b
    from books b
    cross join lateral (
      select json_agg(json_build_object('br', row_to_json(br))) as br from book_reviews br where br.book_id = b.id
    ) as br
    where a.id = b.author_id
  ) b
  cross join lateral (
    select json_agg(row_to_json(c)) as c from comments c where a.id = c.parent_author_id
  ) c
where
  a.id = 1;

Because we could pass the row_to_jsons directly as rows without any extra params mixed in.

It's also just a simpler query, b/c we don't repeating all the columns in json_build_arrays.

Granted, it means more JSON keys over the wire. 🤔

stephenh avatar Apr 27 '25 22:04 stephenh