opteryx
opteryx copied to clipboard
🪲 Predicate pushdowns into JOINs don't appear to be working
trafficstars
TPCH query 08
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
year(o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
data.tpch.part,
data.tpch.supplier,
data.tpch.lineitem,
data.tpch.partsupp,
data.tpch.orders,
data.tpch.nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%plum%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
pre-optimization
└─ EXIT
└─ ORDER BY (nation, o_year DESC)
└─ PROJECT (nation, o_year, SUM(amount))
└─ AGGREGATE (SUM(amount)) GROUP BY (nation, o_year)
└─ SUBQUERY AS profit
└─ PROJECT (nation, YEAR(o_orderdate), l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity)
└─ FILTER (s_suppkey = l_suppkey AND ps_suppkey = l_suppkey AND ps_partkey = l_partkey AND p_partkey = l_partkey AND o_orderkey = l_orderkey AND s_nationkey = n_nationkey AND p_name LIKE '%plum%')
└─ CROSS JOIN
├─ SCAN (data.tpch.part)
├─ SCAN (data.tpch.supplier)
├─ SCAN (data.tpch.lineitem)
├─ SCAN (data.tpch.partsupp)
├─ SCAN (data.tpch.orders)
└─ SCAN (data.tpch.nation)
post optimization
└─ EXIT
└─ ORDER BY (nation, o_year)
└─ PROJECT (SUM(amount), nation, o_year)
└─ AGGREGATE (SUM(amount)) GROUP BY (nation, o_year)
└─ SUBQUERY AS profit
└─ PROJECT (YEAR(o_orderdate), l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity, nation)
└─ FILTER (p_name LIKE '%plum%')
└─ FILTER (s_nationkey = n_nationkey)
└─ FILTER (o_orderkey = l_orderkey)
└─ FILTER (p_partkey = l_partkey)
└─ FILTER (ps_partkey = l_partkey)
└─ FILTER (ps_suppkey = l_suppkey)
└─ FILTER (s_suppkey = l_suppkey)
└─ CROSS JOIN
├─ SCAN (data.tpch.part) [p_partkey, p_name]
├─ SCAN (data.tpch.supplier) [s_suppkey, s_nationkey]
├─ SCAN (data.tpch.lineitem) [l_orderkey, l_partkey, l_suppkey, l_quantity, l_extendedprice, l_discount]
├─ SCAN (data.tpch.partsupp) [ps_partkey, ps_suppkey, ps_supplycost]
├─ SCAN (data.tpch.orders) [o_orderkey, o_orderdate]
└─ SCAN (data.tpch.nation) [n_nationkey, n_name]
the optimizer has pushed the projections to the SCANs, but has not pushed the predicates into the JOIN, maybe because it's a 6-way cross join.
Note this query ultimately fails after running for a few minutes because of the YEAR function call not handling null entries.