pg_hint_plan icon indicating copy to clipboard operation
pg_hint_plan copied to clipboard

Wrong join order and final plan output when a specific hint is given

Open Ctsap opened this issue 3 years ago • 1 comments

Hello all,

I am using the TPC-H Benchmark and I am trying to specify a query plan for the query template number 3. More specifically, the query with the hints provided is the following:

/*+SeqScan(customer) 
    SeqScan(orders) 
    SeqScan(lineitem) 
    NestLoop(orders lineitem) 
    NestLoop(orders lineitem customer)
    Leading ((customer (lineitem orders)))*/

EXPLAIN  SELECT l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority
FROM
  customer,
  orders,
  lineitem
WHERE
  c_mktsegment = 'BUILDING'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < date '1995-03-13'
  and l_shipdate > date '1995-03-13'
GROUP BY
  l_orderkey,
  o_orderdate,
  o_shippriority
  order by
  revenue desc,
  o_orderdate;

As seen from the hints block I want to force two Nested Loops with the pair lineitem and orders is executed first. The problem is that the output of the of above hints is:

Sort  (cost=29746039187.49..29746039976.89 rows=315760 width=48)
   Sort Key: (sum((lineitem.l_extendedprice * ('1'::numeric - lineitem.l_discount)))) DESC, orders.o_orderdate
   ->  GroupAggregate  (cost=29745991156.43..29746000629.23 rows=315760 width=48)
         Group Key: lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority
         ->  Sort  (cost=29745991156.43..29745991945.83 rows=315760 width=28)
               Sort Key: lineitem.l_orderkey, orders.o_orderdate, orders.o_shippriority
               ->  Nested Loop  (cost=20000000000.00..29745954757.67 rows=315760 width=28)
                     Join Filter: (orders.o_orderkey = lineitem.l_orderkey)
                     ->  Seq Scan on lineitem  (cost=0.00..196558.01 rows=3248676 width=20)
                           Filter: (l_shipdate > '1995-03-13'::date)
                     ->  Materialize  (cost=10000000000.00..10328091202.85 rows=145795 width=12)
                           ->  Nested Loop  (cost=10000000000.00..10328089761.88 rows=145795 width=12)
                                 Join Filter: (customer.c_custkey = orders.o_custkey)
                                 ->  Seq Scan on orders  (cost=0.00..46375.00 rows=725346 width=20)
                                       Filter: (o_orderdate < '1995-03-13'::date)
                                 ->  Materialize  (cost=0.00..5733.75 rows=30150 width=4)
                                       ->  Seq Scan on customer  (cost=0.00..5583.00 rows=30150 width=4)
                                             Filter: (c_mktsegment = 'BUILDING'::bpchar)

The above output is wrong based on the given hints. If I specify a different join operator on the orders join pair the produced plan is correct. Generally, the hint NestLoop(orders lineitem) results a wrong plan. Also removing the specific hint the output seems to be correct. Is this a bug or am I using the hints in a wrong way?

Thank you very much in advance.

Ctsap avatar Oct 15 '21 15:10 Ctsap

pg_hint_plan applies leading hints by setting a large number (the same value with when enable_* is set to off) to all join types of unwanted join combinations. That cuases the upper levels put lower priority to the unwanted join combinations. So there's a case where they are finally choosed if no sensible join combinations are found other than them.

Nested Loop (cost=10000000000.00..10328089761.88 rows=145795 width=12)

This means that the join between orders and customer based on orderkey may somehow gets larger cost than 10000000000.00.

horiguti avatar Jan 19 '22 05:01 horiguti