rel8
rel8 copied to clipboard
Generated left-join is too complicated for Postgres to plan well
Hi folks,
My colleague @tstat and I were playing around with the library for the first time and we ran into a performance issue with a generated query that may or may not be totally expected. I understand that translating monadic syntax to SQL queries is perhaps going to be unavoidably "for-loopy" at times, so please feel free to close this issue as a #wontfix, although if that's the case I think we could put together some documentation that mention some known performance gotchas so that users won't necessarily have to examine plan output themselves.
Anyways, the query we were interested in writing with rel8
was a simple left-join. The schema is unimportant here, but let me know if it would be helpful to provide a repo that reproduces this issue exactly. (I think it will be clear enough from the following high-level description).
We have some students
table, and students have first
and last
names, and we're interested in pairing students whose first name maybe matches the last name of another student:
select *
from students s1
left join students s2 on s1.first = s2.last;
This generates the following plan:
Hash Left Join (cost=2.35..4.78 rows=60 width=114)
Hash Cond: (s1.first = s2.last)
-> Seq Scan on students s1 (cost=0.00..1.60 rows=60 width=57)
-> Hash (cost=1.60..1.60 rows=60 width=57)
-> Seq Scan on students s2 (cost=0.00..1.60 rows=60 width=57)
In rel8
, we translated this as:
s1 <- each studentsSchema
s2 <- optional do
s2 <- each studentsSchema
where_ (first s1 ==. last s2)
pure s2
pure (s1, s2)
which generated a messier version of the following query:
select *
from students s1
cross join lateral (
select *
from (select 0) t1
left outer join (
select *
from students s2
where s1.first = s2.last
) t2 on true
) t2;
Unfortunately, Postgres' optimizer does not seem to consider a hash join in this case, and falls back to a much more expensive nested loop join:
Nested Loop (cost=0.00..109.00 rows=60 width=118)
-> Seq Scan on students s1 (cost=0.00..1.60 rows=60 width=57)
-> Nested Loop Left Join (cost=0.00..1.77 rows=1 width=61)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Seq Scan on students s2 (cost=0.00..1.75 rows=1 width=57)
Filter: (s1.first = last)
Curiously, this is not merely due to the existence of a lateral join with a pushed-in where-clause. We tried rewriting the original SQL in this style:
select *
from students s1
left join lateral (
select *
from students s2
where s1.first = s2.last
) t2 on true;
and in this case Postgres did again consider the hash join:
Hash Left Join (cost=2.35..4.78 rows=60 width=114)
Hash Cond: (s1.first = s2.last)
-> Seq Scan on students s1 (cost=0.00..1.60 rows=60 width=57)
-> Hash (cost=1.60..1.60 rows=60 width=57)
-> Seq Scan on students s2 (cost=0.00..1.60 rows=60 width=57)
Very interesting! I imagine there are no indices here. In practice we tend to index these join conditions, at which point we usually see a nested loop and index scan - but I don't think we ever looked at an un-indexed join. Will look into this!
Could you share select version()
, too?
Ok, we think this is a slightly deeper problem in Opaleye itself. At work, we use Opaleye with https://github.com/tomjaguarpaw/haskell-opaleye/pull/480 applied, which produces the plan that you're expecting. There are a few solutions (one is to just use that PR, another is to write smarter code). We'll see where this PR goes first, but we at least do have a handle on the problem now.
Thanks for reporting this!
I believe that the original query was an example of the following problem: a lateral subquery that does not get pulled up constrains the planner to do a nested loop to perform the join. and a nested loop isn't the most efficient way to do every join. This problem is described a bit here.
While the changes to opaleye simplified the original example enough to allow it to be pulled up It is probably worth noting that there remain many examples where the subquery would not be pulled up and the original problem as described above returns. For example, I think this query would be natural to write with rel8's api:
s1 <- each studentsSchema
s2 <- many $ do
s2 <- each studentsSchema
where_ $ stlastname s1 ==. stlastname s2
pure (stfirstname s2)
pure (stlastname s1, s2)
The array_agg
in the lateral subquery prevents it from getting pulled up, and the lateral forces a nested loop, resulting in the following plan:
: Nested Loop Left Join (cost=0.00..6643.83 rows=430 width=64)
: -> Seq Scan on students "T1" (cost=0.00..14.30 rows=430 width=32)
: -> Subquery Scan on "T1_1" (cost=0.00..15.41 rows=1 width=33)
: -> GroupAggregate (cost=0.00..15.40 rows=1 width=36)
: Group Key: 0
: -> Seq Scan on students "T1_2" (cost=0.00..15.38 rows=2 width=36)
: Filter: ("T1".stlastname = stlastname)
compared to a non-lateral variant:
explain
select
s1.stlastname,
s2.first_names
from students s1
left outer join (
select array_agg(s2.stfirstname) as first_names,
s2.stlastname
from students s2
group by s2.stlastname
) s2 using (stlastname)
which allows postgres to consider a hash join:
: Hash Left Join (cost=23.45..38.90 rows=430 width=64)
: Hash Cond: (s1.stlastname = s2.stlastname)
: -> Seq Scan on students s1 (cost=0.00..14.30 rows=430 width=32)
: -> Hash (cost=20.95..20.95 rows=200 width=64)
: -> HashAggregate (cost=16.45..18.95 rows=200 width=64)
: Group Key: s2.stlastname
: -> Seq Scan on students s2 (cost=0.00..14.30 rows=430 width=64)
Thanks @tstat, this is a great point. I'm not entirely sure what to do, but two options come to mind:
- Do nothing. This is really an appeal to hope PostgreSQL gets smarter and can optimize what we give it into something better.
- Try and do the "pulling up" ourselves, if possible. This is basically getting into writing an optimizer, which we've been trying very hard to avoid as it's really hard to do right.
The possibility of introducing a special non-lateral leftJoin
function is hard. The obvious leftJoin :: Query a -> (a -> Expr Bool) -> Query (MaybeTable a)
isn't enough - that Query a
could still refers to other bound variables in Query
. It's a specific goal of Rel8 to not need ST
like tricks to deal with scoping. That said, maybe doing a LATERAL LEFT JOIN
would be fine, and if it's not actually LATERAL
, PostgreSQL will be smart enough to figure out what's going on. Either way, I'm not very excited about this because it doesn't play well with "join functions" that I advocate for in my Rel8 talk.
I dunno if @shane-circuithub has any ideas here.
I appreciate all the input from @tstat and and @mitchellwrosen here. I still don't have a good intuition for this but hopefully that will come!
Having said that, I don't think of the examples in @tstat's comment as being the same query (the Rel8 version and the Postgres version). If I were to translate the Postgres version into Rel8, I would use Rel8.Tabulate
(which unfortunately we haven't released yet).
studentsByLastname :: Tabulation (Expr Text) (Student Expr)
studentsByLastname = fromQuery $ do
student@Student {lastname <- each studentsSchema
pure (lastname, student)
example :: Tabulation (Expr Text) (ListTable Expr (Expr Text))
example = do
_ <- studentsByLastname
manyTabulation $ firstname <$> studentsByLastname
I haven't tested this (if you could post the SQL to create this students table and any indices, that would be helpful), but I'm almost certain that would produce the same query plan as the handwritten SQL.
@shane-circuithub has asked me to elaborate on the above.
First, I can reproduce the bug/poor query plan using just values
:
students :: Query (Expr Text, Expr Text)
students = values [(lit firstName, lit lastName) | firstName <- ["A", "B", "C"], lastName <- [ "D", "E"] ]
query1 = do
s1 <- students
s2 <- many $ do
s2 <- students
where_ $ snd s1 ==. snd s2
pure (fst s2)
pure (snd s1, s2)
query1
produces:
SELECT
CAST("values1_1" AS text) as "_1",
CAST(CASE WHEN ("rebind0_7") IS NULL THEN CAST(ARRAY[] AS text[]) ELSE "result0_6" END AS text[]) as "_2"
FROM (SELECT *
FROM
(SELECT
*
FROM (SELECT "column1" as "values0_1",
"column2" as "values1_1"
FROM
(VALUES
(CAST(E'A' AS text),CAST(E'D' AS text)),
(CAST(E'A' AS text),CAST(E'E' AS text)),
(CAST(E'B' AS text),CAST(E'D' AS text)),
(CAST(E'B' AS text),CAST(E'E' AS text)),
(CAST(E'C' AS text),CAST(E'D' AS text)),
(CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1") as "T1"
LEFT OUTER JOIN
LATERAL
(SELECT
TRUE as "rebind0_7",
*
FROM (SELECT
*
FROM (SELECT
ARRAY_AGG("inner0_6") as "result0_6"
FROM (SELECT
"values0_3" as "inner0_6",
*
FROM (SELECT
*
FROM (SELECT "column1" as "values0_3",
"column2" as "values1_3"
FROM
(VALUES
(CAST(E'A' AS text),CAST(E'D' AS text)),
(CAST(E'A' AS text),CAST(E'E' AS text)),
(CAST(E'B' AS text),CAST(E'D' AS text)),
(CAST(E'B' AS text),CAST(E'E' AS text)),
(CAST(E'C' AS text),CAST(E'D' AS text)),
(CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1"
WHERE (("values1_1") = ("values1_3"))) as "T1") as "T1"
GROUP BY COALESCE(0)) as "T1") as "T1") as "T2"
ON
TRUE) as "T1"
which has the plan
Nested Loop Left Join (cost=0.00..0.84 rows=6 width=64)
-> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=32)
-> Subquery Scan on "T1" (cost=0.00..0.12 rows=1 width=33)
-> GroupAggregate (cost=0.00..0.11 rows=1 width=36)
Group Key: 0
-> Values Scan on "*VALUES*_1" (cost=0.00..0.09 rows=1 width=36)
Filter: ("*VALUES*".column2 = column2)
(7 rows)
If we use the upcoming Rel8.Tabulate
interface, we can write:
query2 = toQuery do
_ <- studentsByLastName
manyTabulation (fst <$> studentsByLastName)
where
studentsByLastName :: Tabulation (Expr Text) (Expr Text, Expr Text)
studentsByLastName = fromQuery do
s <- students
return (snd s, s)
which produces
SELECT
CAST("values1_1" AS text) as "_1",
CAST(CASE WHEN ("rebind0_13") IS NULL THEN CAST(ARRAY[] AS text[]) ELSE "result1_6" END AS text[]) as "_2"
FROM (SELECT *
FROM
(SELECT
*
FROM (SELECT "column1" as "values0_1",
"column2" as "values1_1"
FROM
(VALUES
(CAST(E'A' AS text),CAST(E'D' AS text)),
(CAST(E'A' AS text),CAST(E'E' AS text)),
(CAST(E'B' AS text),CAST(E'D' AS text)),
(CAST(E'B' AS text),CAST(E'E' AS text)),
(CAST(E'C' AS text),CAST(E'D' AS text)),
(CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1") as "T1"
LEFT OUTER JOIN
LATERAL
(SELECT
TRUE as "rebind0_13",
*
FROM (SELECT
*
FROM (SELECT
"inner0_6" as "result0_6",
ARRAY_AGG("inner1_6") as "result1_6"
FROM (SELECT
"values1_4" as "inner0_6",
"values0_4" as "inner1_6",
*
FROM (SELECT
*
FROM (SELECT "column1" as "values0_4",
"column2" as "values1_4"
FROM
(VALUES
(CAST(E'A' AS text),CAST(E'D' AS text)),
(CAST(E'A' AS text),CAST(E'E' AS text)),
(CAST(E'B' AS text),CAST(E'D' AS text)),
(CAST(E'B' AS text),CAST(E'E' AS text)),
(CAST(E'C' AS text),CAST(E'D' AS text)),
(CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1") as "T1") as "T1"
GROUP BY "inner0_6") as "T1"
WHERE (("values1_1") = ("result0_6"))) as "T1") as "T2"
ON
TRUE) as "T1"
which has the plan
Hash Right Join (cost=0.26..0.47 rows=6 width=64)
Hash Cond: ("T1".result0_6 = "*VALUES*".column2)
-> Subquery Scan on "T1" (cost=0.11..0.24 rows=6 width=65)
-> HashAggregate (cost=0.11..0.18 rows=6 width=64)
Group Key: "*VALUES*_1".column2
-> Values Scan on "*VALUES*_1" (cost=0.00..0.08 rows=6 width=64)
-> Hash (cost=0.08..0.08 rows=6 width=32)
-> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=32)
There's nothing really special about Rel8.Tabulate
, it's just a convenient way to write these types of queries. We think of a Tabulation k v
as like a Map k v
. The monad instance for a Tabulation
is intersection.
Interesting, it looks ilke in the Tabulate
plan that the left outer join lateral
doesn't contain an array_agg
because that is pushed into another subquery within the lateral subquery, and this inner subquery doesn't contain any lateral references. This results in the inner subquery not being pulled up (due to the aggregate), but the lateral subquery is pulled up and the lateral is eliminated.
As far as what to do about the problem: I think some documentation about performance considerations would be helpful to the users of rel8.
We have already discussed how lateral queries force a particular join method, but another constraint to be mindful of is that they force a join order. Consider the following example:
create table s (
x int8 not null primary key,
y int8 not null
);
insert into s
select n, mod(n, 2)
from generate_series(1, 5) as s(n)
order by random();
create table t (
x int8 not null primary key,
y int8 not null
);
insert into t
select n, mod(n, 5)
from generate_series(1, 1000000) as t(n)
order by random();
create table u (
x int8 not null primary key,
y int8 not null
);
insert into u
select n, mod(n, 5)
from generate_series(1, 1000000) as t(n)
order by random();
analyze s, t, u;
So s
has 5 rows while t
and u
both have 1,000,000. If we want to join them all by their x
columns then we could write:
select *
from t
inner join u on t.x = u.x
inner join s on s.x = u.x
and postgres will consider different join methods and join orders (because inner join is commutative and associative)
Here is what it planned on my machine
QUERY PLAN
Nested Loop (cost=1.96..4.27 rows=5 width=48) (actual time=0.025..0.071 rows=5 loops=1)
-> Merge Join (cost=1.53..1.88 rows=5 width=32) (actual time=0.021..0.044 rows=5 loops=1)
Merge Cond: (u.x = s.x)
-> Index Scan using u_pkey on u (cost=0.42..51341.77 rows=1000000 width=16) (actual time=0.005..0.026 rows=6 loops=1)
-> Sort (cost=1.11..1.12 rows=5 width=16) (actual time=0.013..0.013 rows=5 loops=1)
Sort Key: s.x
Sort Method: quicksort Memory: 25kB
-> Seq Scan on s (cost=0.00..1.05 rows=5 width=16) (actual time=0.001..0.002 rows=5 loops=1)
-> Index Scan using t_pkey on t (cost=0.42..0.48 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=5)
Index Cond: (x = u.x)
Planning Time: 0.515 ms
Execution Time: 0.103 ms
The statistics postgres keeps on every table helps inform postgres that it can use a merge join to read 6 rows of u
's 1,000,000 since there are only 5
rows in s
and the primary key btree on u
is in u.x
order. Postgres expects only 5
rows to result from this join (the statistics indicate that both s.x
and u.x
are unique). A nested loop with the 5
row outer and an index scan
inner is seen as the cheapest way to join to t
as index scans are cheap and we are only doing 5 loops.
If I rewrote this to use lateral joins like this (using offset 0
to prevent subquery pullup):
select *
from t
, lateral (
select *
from u
where u.x = t.x
offset 0
) as u,
lateral (
select *
from s
where u.x = s.x
offset 0
) as s
then I force the join order (t <> u) <> s and the join method to nested loop and get a terrible plan:
QUERY PLAN
Nested Loop (cost=0.42..9560406.00 rows=1000000 width=48) (actual time=386.602..2168.792 rows=5 loops=1)
-> Nested Loop (cost=0.42..8477906.00 rows=1000000 width=32) (actual time=0.050..1595.613 rows=1000000 loops=1)
-> Seq Scan on t (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.006..50.673 rows=1000000 loops=1)
-> Index Scan using u_pkey on u (cost=0.42..8.44 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=1000000)
Index Cond: (x = t.x)
-> Seq Scan on s (cost=0.00..1.06 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=1000000)
Filter: (u.x = x)
Rows Removed by Filter: 5
Planning Time: 0.288 ms
Execution Time: 2168.823 ms
Although all of the joined columns are indexed, doing 2,000,000 index scans is expensive and not necessary.
If I reorder my lateral query to read from s
first:
select *
from s
, lateral (
select *
from t
where s.x = t.x
offset 0
) as t,
lateral (
select *
from u
where u.x = t.x
offset 0
) as u
Then I will read the 5 rows from s
, do 5
index scans to join to t
, and 5
index scans to join to u
, resulting in a much nicer plan:
QUERY PLAN
Nested Loop (cost=0.85..85.67 rows=5 width=48) (actual time=0.053..0.100 rows=5 loops=1)
-> Nested Loop (cost=0.42..43.36 rows=5 width=32) (actual time=0.039..0.068 rows=5 loops=1)
-> Seq Scan on s (cost=0.00..1.05 rows=5 width=16) (actual time=0.005..0.006 rows=5 loops=1)
-> Index Scan using t_pkey on t (cost=0.42..8.44 rows=1 width=16) (actual time=0.012..0.012 rows=1 loops=5)
Index Cond: (x = s.x)
-> Index Scan using u_pkey on u (cost=0.42..8.44 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=5)
Index Cond: (x = t.x)
Planning Time: 0.295 ms
Execution Time: 0.124 ms
So, when writing queries with rel8, if the subquery is not simple (and thus cannot be pulled up) then one should be mindful of the bind order.
Edit: I was mistaken, you can ignore my original post - but it's here if you want to see it!
I was ready to say "yea, we should just write documentation", but I think you might have found another query that's not currently expressible by Rel8. I wanted to express
select *
from s
, lateral (
select *
from t
where s.x = t.x
offset 0
) as t,
lateral (
select *
from u
where u.x = t.x
offset 0
) as u
In Rel8 but without LATERAL
references, and wanted SQL like:
select * from t, (select * from u offset 0) u, (select * from s offset 0) as s where t.x = u.x and s.x = u.x;
This has a good plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.59..29159.65 rows=5 width=48) (actual time=7.064..88.236 rows=5 loops=1)
-> Hash Join (cost=1.16..29157.21 rows=5 width=32) (actual time=7.059..88.215 rows=5 loops=1)
Hash Cond: (u.x = s.x)
-> Seq Scan on u (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.019..44.101 rows=1000000 loops=1)
-> Hash (cost=1.10..1.10 rows=5 width=16) (actual time=0.004..0.004 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on s (cost=0.00..1.05 rows=5 width=16) (actual time=0.002..0.002 rows=5 loops=1)
-> Index Scan using t_pkey on t (cost=0.42..0.48 rows=1 width=16) (actual time=0.002..0.003 rows=1 loops=5)
Index Cond: (x = u.x)
Planning time: 0.188 ms
Execution time: 88.255 ms
(11 rows)
However, I don't seem to be able to express this. The obvious
putStrLn $ showQuery $ liftA3 (,,) (offset 0 (each tSchema)) (offset 0 (each sSchema)) (offset 0 (each uSchema)) >>= \(t,s,u) -> where_ (fst u ==. fst t &&. fst u ==. fst s) >> return (t, s, u)
produces
explain analyze SELECT
CAST("x0_1" AS int8) as "_1/_1",
CAST("y1_1" AS int8) as "_1/_2",
CAST("x0_3" AS int8) as "_2/_1",
CAST("y1_3" AS int8) as "_2/_2",
CAST("x0_6" AS int8) as "_3/_1",
CAST("y1_6" AS int8) as "_3/_2"
FROM (SELECT
*
FROM (SELECT
*
FROM (SELECT
*
FROM (SELECT
"x" as "x0_1",
"y" as "y1_1"
FROM "t" as "T1") as "T1") as "T1"
OFFSET 0) as "T1",
LATERAL
(SELECT
*
FROM (SELECT
*
FROM (SELECT
"x" as "x0_3",
"y" as "y1_3"
FROM "s" as "T1") as "T1") as "T1"
OFFSET 0) as "T2",
LATERAL
(SELECT
*
FROM (SELECT
*
FROM (SELECT
"x" as "x0_6",
"y" as "y1_6"
FROM "u" as "T1") as "T1") as "T1"
OFFSET 0) as "T3"
WHERE ((("x0_6") = ("x0_1")) AND (("x0_6") = ("x0_3")))) as "T1"
But this doesn't plan as well:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=29157.27..58313.32 rows=5 width=48) (actual time=96.610..177.602 rows=5 loops=1)
Hash Cond: ("T1".x = "T1_1".x)
-> Seq Scan on u "T1" (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.023..44.073 rows=1000000 loops=1)
-> Hash (cost=29157.21..29157.21 rows=5 width=32) (actual time=89.632..89.632 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Hash Join (cost=1.16..29157.21 rows=5 width=32) (actual time=6.532..89.631 rows=5 loops=1)
Hash Cond: ("T1_1".x = "T1_2".x)
-> Seq Scan on t "T1_1" (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.007..45.434 rows=1000000 loops=1)
-> Hash (cost=1.10..1.10 rows=5 width=16) (actual time=0.005..0.005 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on s "T1_2" (cost=0.00..1.05 rows=5 width=16) (actual time=0.002..0.002 rows=5 loops=1)
Planning time: 0.174 ms
Execution time: 177.623 ms
I wonder if a custom Applicative
instance is in order that doesn't use LATERAL
?
I made a mistake in my Rel8 query. I wrote
liftA3 (,,) (offset 0 (each tSchema)) (offset 0 (each sSchema)) (offset 0 (each uSchema))
But the SQL I wanted is actually produced by
liftA3 (,,) (each tSchema) (offset 0 (each sSchema)) (offset 0 (each uSchema))
This does have the correct query plan.
So at this point, I think it's just a case of taking care when writing Rel8 queries, and we should write documentation as to what this care is. Fortunately there's a goldmine of information in this issue, so we can be a lot more precise than before!