pg_hint_plan
pg_hint_plan copied to clipboard
Wrong join order and final plan output when a specific hint is given
Hello all,
I am using the TPC-H Benchmark and I am trying to specify a query plan for the query template number 3. More specifically, the query with the hints provided is the following:
/*+SeqScan(customer)
SeqScan(orders)
SeqScan(lineitem)
NestLoop(orders lineitem)
NestLoop(orders lineitem customer)
Leading ((customer (lineitem orders)))*/
EXPLAIN SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-13'
and l_shipdate > date '1995-03-13'
GROUP BY
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate;
As seen from the hints block I want to force two Nested Loops with the pair lineitem and orders is executed first. The problem is that the output of the of above hints is:
Sort (cost=29746039187.49..29746039976.89 rows=315760 width=48)
Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC, orders.o_orderdate
-> GroupAggregate (cost=29745991156.43..29746000629.23 rows=315760 width=48)
Group Key: lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority
-> Sort (cost=29745991156.43..29745991945.83 rows=315760 width=28)
Sort Key: lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority
-> Nested Loop (cost=20000000000.00..29745954757.67 rows=315760 width=28)
Join Filter: (orders.o_orderkey = lineitem.l_orderkey)
-> Seq Scan on lineitem (cost=0.00..196558.01 rows=3248676 width=20)
Filter: (l_shipdate > '1995-03-13'::date)
-> Materialize (cost=10000000000.00..10328091202.85 rows=145795 width=12)
-> Nested Loop (cost=10000000000.00..10328089761.88 rows=145795 width=12)
Join Filter: (customer.c_custkey = orders.o_custkey)
-> Seq Scan on orders (cost=0.00..46375.00 rows=725346 width=20)
Filter: (o_orderdate < '1995-03-13'::date)
-> Materialize (cost=0.00..5733.75 rows=30150 width=4)
-> Seq Scan on customer (cost=0.00..5583.00 rows=30150 width=4)
Filter: (c_mktsegment = 'BUILDING'::bpchar)
The above output is wrong based on the given hints. If I specify a different join operator on the orders join pair the produced plan is correct. Generally, the hint NestLoop(orders lineitem) results a wrong plan. Also removing the specific hint the output seems to be correct. Is this a bug or am I using the hints in a wrong way?
Thank you very much in advance.
pg_hint_plan applies leading hints by setting a large number (the same value with when enable_* is set to off) to all join types of unwanted join combinations. That cuases the upper levels put lower priority to the unwanted join combinations. So there's a case where they are finally choosed if no sensible join combinations are found other than them.
Nested Loop (cost=10000000000.00..10328089761.88 rows=145795 width=12)
This means that the join between orders and customer based on orderkey may somehow gets larger cost than 10000000000.00.