pg_hint_plan icon indicating copy to clipboard operation
pg_hint_plan copied to clipboard

leading hint + join methods hint cannot totally force the join order.

Open ZhengtongYan opened this issue 1 year ago • 4 comments

I tested Q19a query of JOB benchmark in the latest PG17beta1 version. I found the following problems which indicate that hints cannot totally force the decisions of optimizer and optimizer will not follow the hints in some cases.

  1. Leading hint only: work correctly!
/*+ Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))) */
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=1129471.94..1129471.95 rows=1 width=64)
   ->  Nested Loop  (cost=1031727.69..1129471.94 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=1031727.69..1129453.01 rows=1 width=36)
               ->  Nested Loop  (cost=1031727.69..1129453.00 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=1031727.69..1129449.85 rows=59 width=40)
                           ->  Hash Join  (cost=1031727.27..1129253.56 rows=163 width=44)
                                 Hash Cond: (n.id = ci.person_id)
                                 ->  Gather  (cost=1000.00..83655.92 rows=9516 width=19)
                                       Workers Planned: 2
                                       ->  Parallel Seq Scan on name n  (cost=0.00..81704.32 rows=3965 width=19)
                                             Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Hash  (cost=1029835.88..1029835.88 rows=71311 width=37)
                                       ->  Merge Join  (cost=928724.12..1029835.88 rows=71311 width=37)
                                             Merge Cond: (chn.id = ci.person_role_id)
                                             ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..92048.75 rows=3140288 width=4)
                                             ->  Materialize  (cost=928688.51..929418.41 rows=145979 width=41)
                                                   ->  Sort  (cost=928688.51..929053.46 rows=145979 width=41)
                                                         Sort Key: ci.person_role_id
                                                         ->  Gather  (cost=861411.88..911672.87 rows=145979 width=41)
                                                               Workers Planned: 2
                                                               ->  Parallel Hash Join  (cost=860411.88..896074.97 rows=60825 width=41)
                                                                     Hash Cond: (mc.movie_id = t.id)
                                                                     ->  Parallel Seq Scan on movie_companies mc  (cost=0.00..35131.06 rows=128227 width=8)
                                                                           Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text))
)
                                                                     ->  Parallel Hash  (cost=859150.63..859150.63 rows=100900 width=49)
                                                                           ->  Parallel Hash Join  (cost=601038.73..859150.63 rows=100900 width=49)
                                                                                 Hash Cond: (mi.movie_id = t.id)
                                                                                 ->  Parallel Seq Scan on movie_info mi  (cost=0.00..253676.43 rows=226580 width=8)
                                                                                       Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                                                 ->  Parallel Hash  (cost=597052.41..597052.41 rows=187225 width=41)
                                                                                       ->  Parallel Hash Join  (cost=525612.09..597052.41 rows=187225 width=41)
                                                                                             Hash Cond: (t.id = ci.movie_id)
                                                                                             ->  Parallel Seq Scan on title t  (cost=0.00..51808.72 rows=235421 width=21)
                                                                                                   Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                                                                             ->  Parallel Hash  (cost=510231.45..510231.45 rows=837731 width=20)
                                                                                                   ->  Parallel Hash Join  (cost=21334.10..510231.45 rows=837731 width=20
)
                                                                                                         Hash Cond: (ci.person_id = an.person_id)
                                                                                                         ->  Parallel Seq Scan on cast_info ci  (cost=0.00..479269.70 row
s=345362 width=16)
                                                                                                               Filter: (note = ANY ('{(voice),"(voice: Japanese version)"
,"(voice) (uncredited)","(voice: English version)"}'::text[]))
                                                                                                         ->  Parallel Hash  (cost=15171.60..15171.60 rows=375560 width=4)
                                                                                                               ->  Parallel Seq Scan on aka_name an  (cost=0.00..15171.60
 rows=375560 width=4)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
  1. Leading hint + join methods hint: wrong results.
/*+
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=27678801700.75..27678801700.76 rows=1 width=64)
   ->  Nested Loop  (cost=20000031697.50..27678801700.75 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=20000031697.50..27678801681.81 rows=1 width=36)
               ->  Nested Loop  (cost=20000031697.50..27678801681.81 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=20000031697.50..27678801678.66 rows=59 width=40)
                           ->  Nested Loop  (cost=20000031697.08..27678801482.37 rows=163 width=44)
                                 Join Filter: (n.id = ci.person_id)
                                 ->  Seq Scan on name n  (cost=0.00..118171.96 rows=9516 width=19)
                                       Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Materialize  (cost=20000031697.08..27663195186.54 rows=71311 width=37)
                                       ->  Nested Loop  (cost=20000031697.08..27663194271.99 rows=71311 width=37)
                                             ->  Nested Loop  (cost=10000031696.65..17663060878.15 rows=145979 width=41)
                                                   Join Filter: (t.id = mc.movie_id)
                                                   ->  Seq Scan on movie_companies mc  (cost=0.00..57960.93 rows=307745 width=8)
                                                         Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
                                                   ->  Materialize  (cost=10000031696.65..15817340640.79 rows=242159 width=49)
                                                         ->  Nested Loop  (cost=10000031696.65..15817337065.00 rows=242159 width=49)
                                                               Join Filter: (t.id = mi.movie_id)
                                                               ->  Seq Scan on movie_info mi  (cost=0.00..382516.23 rows=543793 width=8)
                                                                     Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                               ->  Materialize  (cost=10000031696.65..10003749915.92 rows=449341 width=41)
                                                                     ->  Nested Loop  (cost=10000031696.65..10003743719.21 rows=449341 width=41)
                                                                           ->  Hash Join  (cost=31696.22..852385.40 rows=2010555 width=20)
                                                                                 Hash Cond: (ci.person_id = an.person_id)
                                                                                 ->  Seq Scan on cast_info ci  (cost=0.00..796439.28 rows=828870 width=16)
                                                                                       Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)",
"(voice: English version)"}'::text[]))
                                                                                 ->  Hash  (cost=20429.43..20429.43 rows=901343 width=4)
                                                                                       ->  Seq Scan on aka_name an  (cost=0.00..20429.43 rows=901343 width=4)
                                                                           ->  Index Scan using title_pkey on title t  (cost=0.43..1.44 rows=1 width=21)
                                                                                 Index Cond: (id = ci.movie_id)
                                                                                 Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                             ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..0.91 rows=1 width=4)
                                                   Index Cond: (id = ci.person_role_id)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Actual join order and join algorithms:

(rt (it ((n ((mc (mi ((ci an) t))) chn)) cn)))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)

We could see that:

  • both table t and chn are not following the leading hint ((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))).
  • both table t and chn use index scan, so they are used as the inner tables.
  1. Leading hint + join methods hint + access methods hint: work correctly!. If we add hints SeqScan(t) SeqScan(chn) to force the access methods, the join order will become right.
/*+
SeqScan(t) SeqScan(chn)
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=42280242141.33..42280242141.34 rows=1 width=64)
   ->  Nested Loop  (cost=32696.64..42280242141.32 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=32696.64..42280242122.39 rows=1 width=36)
               ->  Nested Loop  (cost=32696.64..42280242122.39 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=32696.64..42280242119.24 rows=59 width=40)
                           ->  Nested Loop  (cost=32696.22..42280241922.95 rows=163 width=44)
                                 Join Filter: (n.id = ci.person_id)
                                 ->  Seq Scan on name n  (cost=0.00..118171.96 rows=9516 width=19)
                                       Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Materialize  (cost=32696.22..42264635627.12 rows=71311 width=37)
                                       ->  Nested Loop  (cost=32696.22..42264634712.57 rows=71311 width=37)
                                             Join Filter: (chn.id = ci.person_role_id)
                                             ->  Seq Scan on char_name chn  (cost=0.00..67850.88 rows=3140288 width=4)
                                             ->  Materialize  (cost=32696.22..31356197189.35 rows=145979 width=41)
                                                   ->  Nested Loop  (cost=32696.22..31356195175.46 rows=145979 width=41)
                                                         Join Filter: (t.id = mc.movie_id)
                                                         ->  Seq Scan on movie_companies mc  (cost=0.00..57960.93 rows=307745 width=8)
                                                               Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
                                                         ->  Materialize  (cost=32696.22..29510474938.10 rows=242159 width=49)
                                                               ->  Nested Loop  (cost=32696.22..29510471362.30 rows=242159 width=49)
                                                                     Join Filter: (t.id = mi.movie_id)
                                                                     ->  Seq Scan on movie_info mi  (cost=0.00..382516.23 rows=543793 width=8)
                                                                           Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                                     ->  Materialize  (cost=32696.22..23696884213.22 rows=449341 width=41)
                                                                           ->  Nested Loop  (cost=32696.22..23696878016.52 rows=449341 width=41)
                                                                                 Join Filter: (t.id = ci.movie_id)
                                                                                 ->  Seq Scan on title t  (cost=0.00..73929.74 rows=565011 width=21)
                                                                                       Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                                                                 ->  Materialize  (cost=32696.22..640936.59 rows=2010555 width=20)
                                                                                       ->  Hash Join  (cost=32696.22..619102.82 rows=2010555 width=20)
                                                                                             Hash Cond: (ci.person_id = an.person_id)
                                                                                             ->  Gather  (cost=1000.00..563156.70 rows=828870 width=16)
                                                                                                   Workers Planned: 2
                                                                                                   ->  Parallel Seq Scan on cast_info ci  (cost=0.00..479269.70 rows=3453
62 width=16)
                                                                                                         Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voi
ce) (uncredited)","(voice: English version)"}'::text[]))
                                                                                             ->  Hash  (cost=20429.43..20429.43 rows=901343 width=4)
                                                                                                   ->  Seq Scan on aka_name an  (cost=0.00..20429.43 rows=901343 width=4)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Based on the above results, we can get the following conclusions:

  • If we only use the leading hint, the optimizer will choose the appropriate join methods to follow the specified join order.
  • If we use both leading hint and join methods hint, the join order will not strictly follow the join order hint in some cases. For example, in our example the optimizer choose to use the index nested loop join rather than the normal nested loop join. So it put t and chn as the inner table in the nest loop join.
  • Use only leading hint means the optimizer has some freedom to choose the join methods and access methods.
  • The pg_hint_plan has bugs that optimizer's decisions cannot be totally forced and controlled by hints when some hints combinations are used. This is a bug of the hint and need to be fixed. There is a similar issue on github about this problem.

Recommendation: fix this bug by completely forcing the optimizer following the hints such as a combination hints with join order and join algorithms!

ZhengtongYan avatar Jun 26 '24 01:06 ZhengtongYan

I have the same question. I also wonder why the planning time is so long, even though I have restricted the join order and join method with pg_hint_plan.

zk9907 avatar Jul 05 '24 05:07 zk9907

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

michaelpq avatar Aug 20 '24 04:08 michaelpq

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

You can follow these steps to set up the JOB benchmark

1. Download imdb data:

wget http://homepages.cwi.nl/~boncz/job/imdb.tgz
tar -zxvf imdb.tar.gz

in postgresql

CREATE DATABASE imdb;
\c imdb
\i /PATH/TO/imdb/schematext.sql
\copy aka_name from '~/Downloads/imdb_data/aka_name.csv' csv delimiter ',' quote '"' escape '\';
\copy aka_title from '~/Downloads/imdb_data/aka_title.csv' csv delimiter ',' quote '"' escape '\';
\copy cast_info from '~/Downloads/imdb_data/cast_info.csv' csv delimiter ',' quote '"' escape '\';
\copy char_name from '~/Downloads/imdb_data/char_name.csv' csv delimiter ',' quote '"' escape '\';
\copy comp_cast_type from '~/Downloads/imdb_data/comp_cast_type.csv' csv delimiter ',' quote '"' escape '\';
\copy company_name from '~/Downloads/imdb_data/company_name.csv' csv delimiter ',' quote '"' escape '\';
\copy company_type from '~/Downloads/imdb_data/company_type.csv' csv delimiter ',' quote '"' escape '\';
\copy complete_cast from '~/Downloads/imdb_data/complete_cast.csv' csv delimiter ',' quote '"' escape '\';
\copy info_type from '~/Downloads/imdb_data/info_type.csv' csv delimiter ',' quote '"' escape '\';
\copy keyword from '~/Downloads/imdb_data/keyword.csv' csv delimiter ',' quote '"' escape '\';
\copy kind_type from '~/Downloads/imdb_data/kind_type.csv' csv delimiter ',' quote '"' escape '\';
\copy link_type from '~/Downloads/imdb_data/link_type.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_companies from '~/Downloads/imdb_data/movie_companies.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_info from '~/Downloads/imdb_data/movie_info.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_info_idx from '~/Downloads/imdb_data/movie_info_idx.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_keyword from '~/Downloads/imdb_data/movie_keyword.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_link from '~/Downloads/imdb_data/movie_link.csv' csv delimiter ',' quote '"' escape '\';
\copy name from '~/Downloads/imdb_data/name.csv' csv delimiter ',' quote '"' escape '\';
\copy person_info from '~/Downloads/imdb_data/person_info.csv' csv delimiter ',' quote '"' escape '\';
\copy role_type from '~/Downloads/imdb_data/role_type.csv' csv delimiter ',' quote '"' escape '\';
\copy title from '~/Downloads/imdb_data/title.csv' csv delimiter ',' quote '"' escape '\';
\i /PATH/TO/imdb/fkindexes.sql

Download JOB Benchmark:

git clone https://github.com/gregrahn/join-order-benchmark

zk9907 avatar Sep 20 '24 04:09 zk9907

Hi Micahel-san,

Regarding to How to set up JOB, I hope this info helps you:

https://github.com/yamatattsu/pg_plan_advsr/blob/master/JOB/how_to_setup.md

On Fri, Sep 20, 2024 at 1:56 PM KaiZhong @.***> wrote:

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

You can follow these steps to set up the JOB benchmark download JOB benchmark (VLDB version)

  1. Download imdb data: wget http://homepages.cwi.nl/~boncz/job/imdb.tgz tar -zxvf imdb.tar.gz

in postgresql

CREATE DATABASE imdb; \c imdb \i /PATH/TO/imdb/schematext.sql

\copy aka_name from '/Downloads/imdb_data/aka_name.csv' csv delimiter ',' quote '"' escape ''; \copy aka_title from '/Downloads/imdb_data/aka_title.csv' csv delimiter ',' quote '"' escape ''; \copy cast_info from '/Downloads/imdb_data/cast_info.csv' csv delimiter ',' quote '"' escape ''; \copy char_name from '/Downloads/imdb_data/char_name.csv' csv delimiter ',' quote '"' escape ''; \copy comp_cast_type from '/Downloads/imdb_data/comp_cast_type.csv' csv delimiter ',' quote '"' escape ''; \copy company_name from '/Downloads/imdb_data/company_name.csv' csv delimiter ',' quote '"' escape ''; \copy company_type from '/Downloads/imdb_data/company_type.csv' csv delimiter ',' quote '"' escape ''; \copy complete_cast from '/Downloads/imdb_data/complete_cast.csv' csv delimiter ',' quote '"' escape ''; \copy info_type from '/Downloads/imdb_data/info_type.csv' csv delimiter ',' quote '"' escape ''; \copy keyword from '/Downloads/imdb_data/keyword.csv' csv delimiter ',' quote '"' escape ''; \copy kind_type from '/Downloads/imdb_data/kind_type.csv' csv delimiter ',' quote '"' escape ''; \copy link_type from '/Downloads/imdb_data/link_type.csv' csv delimiter ',' quote '"' escape ''; \copy movie_companies from '/Downloads/imdb_data/movie_companies.csv' csv delimiter ',' quote '"' escape ''; \copy movie_info from '/Downloads/imdb_data/movie_info.csv' csv delimiter ',' quote '"' escape ''; \copy movie_info_idx from '/Downloads/imdb_data/movie_info_idx.csv' csv delimiter ',' quote '"' escape ''; \copy movie_keyword from '/Downloads/imdb_data/movie_keyword.csv' csv delimiter ',' quote '"' escape ''; \copy movie_link from '/Downloads/imdb_data/movie_link.csv' csv delimiter ',' quote '"' escape ''; \copy name from '/Downloads/imdb_data/name.csv' csv delimiter ',' quote '"' escape ''; \copy person_info from '/Downloads/imdb_data/person_info.csv' csv delimiter ',' quote '"' escape ''; \copy role_type from '/Downloads/imdb_data/role_type.csv' csv delimiter ',' quote '"' escape ''; \copy title from '~/Downloads/imdb_data/title.csv' csv delimiter ',' quote '"' escape '';

\i /PATH/TO/imdb/fkindexes.sql

  1. Download JOB Benchmark: Git clone https://github.com/gregrahn/join-order-benchmark

— Reply to this email directly, view it on GitHub https://github.com/ossc-db/pg_hint_plan/issues/195#issuecomment-2362822525, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSHQGYUMSHQ4B6P65VUWPLZXOTHTAVCNFSM6AAAAABJ43O25CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSHAZDENJSGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yamatattsu avatar Sep 26 '24 00:09 yamatattsu

Hi, all. I am working on this issue. I found that the join order inconsistent with the input hints is caused by the implementation of PG and is not a bug of pg_hint_plan.

In particular, PG disables certain operators by adding a disable_cost(1.0e10) to its estimated cost; then, the planner would not select this operator due to its large cost. To assign a join order, pg_hint_plan disables all join operators when considering inconsistent join orders (by adding a disable cost to the cost of join operators). This is how pg_hint_plan works.

In this issue, the bug in the second example of @ZhengtongYan is it generates sub-join order ((ci an) t) for assigned sub-join order (t (ci an)) and ((mc (mi ((ci an) t))) chn) for (chn (mc (mi ((ci an) t)))). This is because PG estimates the cost of operators in two phases. Moreover, PG will filter some obviously inferior based on the estimated cost in the first phase. However, PG does not include disable_cost for disabled operators and only considers it in the second phase. While (t (ci an)) will utilize a normal nest loop join and access relation t with a seq scan, ((ci an) t) will utilize an index-based nest loop join and access relation t with an index scan, which is obviously faster. Thus, (t (ci an)) will be filtered after the first phase of cost estimation. Same reason to (chn (mc (mi ((ci an) t)))).

The reason why the third example works is that forces PG to select seqscan for relation t and chn, thus PG would not consider index-based nest loop joins for them. I have addressed the problem in the second example by considering disable_cost for the disabled operators in the first phase of cost estimation.

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

HennyNile avatar Oct 28 '24 07:10 HennyNile

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see: https://wiki.postgresql.org/wiki/Submitting_a_Patch

michaelpq avatar Oct 29 '24 02:10 michaelpq

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see: https://wiki.postgresql.org/wiki/Submitting_a_Patch

I see. Thanks for the reminder.

HennyNile avatar Oct 29 '24 12:10 HennyNile

@michaelpq Hi, I found a way to solve this problem we could just write a wrapper for the function of cost estimation of PostgreSQL. How do you like this idea?

HennyNile avatar Nov 01 '24 14:11 HennyNile

Hi, all! I have created a pull-request in #207.

HennyNile avatar Nov 12 '24 11:11 HennyNile

@ZhengtongYan This problem is solved in the PG18, you can check my discussion in this #207 and the solution of PostgreSQL team in https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e22253467.

HennyNile avatar Jun 27 '25 07:06 HennyNile

Ahh, so the increment of disabled_nodes for the disabled operators is what comes into help here. Yes, let's close this ticket if it has no reason to be. Please note that I had also to slightly patch pg_hint_plan for the parallel nodes to disable them, to stick with the new logic of v18.

michaelpq avatar Jun 29 '25 23:06 michaelpq