pg_hint_plan
pg_hint_plan copied to clipboard
parallel hint behave strangely with union all
parallel hint behave strangely as following example shows:
create table t1(id int,val int);
create table t2(id int,val int);
create table t3(id int,val int);
case 1: no hint
lightdb@postgres=# explain analyze (select * from t1,t3 where t1.id=t3.id) union all (select * from t1,t2 where t1.id=t2.id);
QUERY PLAN
-----------------------------------------------------------------------------------
------------------------------------
Append (cost=317.01..2188.90 rows=51076 width=16) (actual time=0.234..0.311 rows=
0 loops=1)
-> Merge Join (cost=317.01..711.38 rows=25538 width=16) (actual time=0.058..0.
088 rows=0 loops=1)
Merge Cond: (t1.id = t3.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.045..0.06
3 rows=0 loops=1)
Sort Key: t1.id
Sort Method: quicksort Memory: 33kB
-> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual tim
e=0.024..0.030 rows=0 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (never executed)
Sort Key: t3.id
-> Seq Scan on t3 (cost=0.00..32.60 rows=2260 width=8) (never exec
uted)
-> Merge Join (cost=317.01..711.38 rows=25538 width=16) (actual time=0.039..0.
068 rows=0 loops=1)
Merge Cond: (t1_1.id = t2.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.026..0.04
4 rows=0 loops=1)
Sort Key: t1_1.id
Sort Method: quicksort Memory: 33kB
-> Seq Scan on t1 t1_1 (cost=0.00..32.60 rows=2260 width=8) (actua
l time=0.008..0.014 rows=0 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (never executed)
Sort Key: t2.id
-> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8) (never exec
uted)
Planning Time: 0.252 ms
Execution Time: 0.409 ms
(21 rows)
case 2: use parallel hint , parallel union all ?
lightdb@postgres=# explain analyze (select /*+parallel(t1 5) parallel(t3 5)*/* from t1,t3 where t1.id=t3.id) union all (select /*+parallel(t1 5) parallel(t2 5)*/* from t1,t2 where t1.id=t2.id);
QUERY PLAN
-----------------------------------------------------------------------------------
----------------------------------------------------
Gather (cost=10009.11..16081.79 rows=51076 width=16) (actual time=6.514..8.059 ro
ws=0 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Parallel Append (cost=9.11..974.19 rows=51076 width=16) (actual time=0.777.
.0.861 rows=0 loops=1)
-> Parallel Hash Join (cost=9.11..104.03 rows=5108 width=16) (actual tim
e=0.689..0.721 rows=0 loops=1)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on t1 @"lt#1" (cost=0.00..0.00 rows=729 width
=8) (never executed)
-> Parallel Hash (cost=0.00..0.00 rows=729 width=8) (actual time=0
.025..0.044 rows=0 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 0kB
-> Parallel Seq Scan on t2 @"lt#1" (cost=0.00..0.00 rows=729
width=8) (actual time=0.009..0.014 rows=0 loops=1)
-> Parallel Hash Join (cost=9.11..104.03 rows=5108 width=16) (actual tim
e=0.068..0.098 rows=0 loops=1)
Hash Cond: (t1_1.id = t3.id)
-> Parallel Seq Scan on t1 t1_1 @"lt#0" (cost=0.00..0.00 rows=729
width=8) (never executed)
-> Parallel Hash (cost=0.00..0.00 rows=729 width=8) (actual time=0
.019..0.037 rows=0 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 0kB
-> Parallel Seq Scan on t3 @"lt#0" (cost=0.00..0.00 rows=729
case 3: use parallel hint , parallel tables in subquery
lightdb@postgres=# explain analyze (select /*+parallel(t1 5) parallel(t3 5)*/* from t1,t3 where t1.id=t3.id) union all (select /*+parallel(t1 5) parallel(t2 5)*/* from t1,t2 where 1=1);
QUERY PLAN
-----------------------------------------------------------------------------------
----------------------------------------------------
Append (cost=251.81..128719.25 rows=5133138 width=16) (actual time=21.114..22.092
rows=0 loops=1)
-> Merge Join (cost=251.81..646.18 rows=25538 width=16) (actual time=11.813..1
1.952 rows=0 loops=1)
Merge Cond: (t1.id = t3.id)
-> Sort (cost=125.91..131.56 rows=2260 width=8) (actual time=11.797..11.
924 rows=0 loops=1)
Sort Key: t1.id
Sort Method: quicksort Memory: 33kB
-> Gather (cost=0.00..0.00 rows=2260 width=8) (actual time=11.745.
.11.859 rows=0 loops=1)
Workers Planned: 5
Workers Launched: 5
-> Parallel Seq Scan on t1 @"lt#0" (cost=0.00..0.00 rows=729
width=8) (actual time=0.009..0.015 rows=0 loops=6)
-> Sort (cost=125.91..131.56 rows=2260 width=8) (never executed)
Sort Key: t3.id
-> Gather (cost=0.00..0.00 rows=2260 width=8) (never executed)
Workers Planned: 5
Workers Launched: 0
-> Parallel Seq Scan on t3 @"lt#0" (cost=0.00..0.00 rows=729
width=8) (never executed)
-> Nested Loop (cost=0.00..51076.00 rows=5107600 width=16) (actual time=9.279.
.10.099 rows=0 loops=1)
-> Gather (cost=0.00..0.00 rows=2260 width=8) (actual time=9.265..10.073
rows=0 loops=1)
chuhx@postgres=#
case 4: parallel changed even use tablescan
lightdb@postgres=# explain analyze (select * from t1,t3 where t1.id=t3.id) union all (select/*+seqscan(t1)*/ * from t1,t2 where t1.id=t2.id);
QUERY PLAN
-----------------------------------------------------------------------------------
---------------------------------------------
Gather (cost=10317.01..16180.77 rows=51076 width=16) (actual time=6.246..7.522 ro
ws=0 loops=1)
Workers Planned: 1
Workers Launched: 1
Single Copy: true
-> Parallel Append (cost=317.01..1073.17 rows=21282 width=16) (actual time=0.1
41..0.218 rows=0 loops=1)
-> Merge Join (cost=317.01..711.38 rows=25538 width=16) (actual time=0.0
86..0.115 rows=0 loops=1)
Merge Cond: (t1.id = t2.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.072
..0.089 rows=0 loops=1)
Sort Key: t1.id
Worker 0: Sort Method: quicksort Memory: 33kB
-> Seq Scan on t1 @"lt#0" (cost=0.00..32.60 rows=2260 width=
8) (actual time=0.016..0.022 rows=0 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (never executed)
Sort Key: t2.id
-> Seq Scan on t2 @"lt#0" (cost=0.00..32.60 rows=2260 width=
8) (never executed)
-> Merge Join (cost=317.01..711.38 rows=25538 width=16) (actual time=0.0
36..0.065 rows=0 loops=1)
Merge Cond: (t1_1.id = t3.id)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (actual time=0.024
..0.042 rows=0 loops=1)
Sort Key: t1_1.id
Worker 0: Sort Method: quicksort Memory: 33kB
-> Seq Scan on t1 t1_1 (cost=0.00..32.60 rows=2260 width=8)
(actual time=0.009..0.015 rows=0 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=8) (never executed)
Sort Key: t3.id
-> Seq Scan on t3 (cost=0.00..32.60 rows=2260 width=8) (neve
r executed)
Planning Time: 0.326 ms
Execution Time: 7.614 ms
(25 rows)
case 2 and case 4 behaves this way because the following code, it disable parallel path, I don't quite understand what this code does? for parallel union all? but it disable parallel in subquery. There is no way to select parallel subqueries or parallel parent queries.
/*
* Even though UNION ALL node doesn't have particular name so usually it is
* unhintable, turn on parallel when it contains parallel nodes.
*/
if (rel->rtekind == RTE_SUBQUERY)
{
ListCell *lc;
bool inhibit_nonparallel = false;
if (rel->partial_pathlist == NIL)
return;
foreach(lc, rel->partial_pathlist)
{
ListCell *lcp;
AppendPath *apath = (AppendPath *) lfirst(lc);
int parallel_workers = 0;
if (!IsA(apath, AppendPath))
continue;
foreach (lcp, apath->subpaths)
{
Path *spath = (Path *) lfirst(lcp);
if (spath->parallel_aware &&
parallel_workers < spath->parallel_workers)
parallel_workers = spath->parallel_workers;
}
apath->path.parallel_workers = parallel_workers;
inhibit_nonparallel = true;
}
if (inhibit_nonparallel)
{
ListCell *lc;
foreach(lc, rel->pathlist)
{
Path *path = (Path *) lfirst(lc);
if (path->startup_cost < disable_cost)
{
path->startup_cost += disable_cost;
path->total_cost += disable_cost;
}
}
}
return;
}
I think spath->parallel_aware can be removed, because there is no check for parallel_aware when dealing with parallel append in add_paths_to_append_re func
if (partial_subpaths_valid && partial_subpaths != NIL)
{
AppendPath *appendpath;
ListCell *lc;
int parallel_workers = 0;
/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);
parallel_workers = Max(parallel_workers, path->parallel_workers);
}
after remove spath->parallel_aware it work as following:
chuhx@postgres=# explain (analyze) (select /*+parallel(t1 5) parallel(t3 5)*/* from t1,t3 where t1.id=t3.id) union all (select /*+parallel(t1 5) parallel(t2 5)*/* from t1,t2 where t1.id=t2.id);
QUERY PLAN
-----------------------------------------------------------------------------------
----------------------------------------------------
Gather (cost=10009.11..16081.79 rows=51076 width=16) (actual time=1.522..10.924 r
ows=2 loops=1)
Workers Planned: 5
Workers Launched: 5
-> Parallel Append (cost=9.11..974.19 rows=51076 width=16) (actual time=0.145.
.0.203 rows=0 loops=6)
-> Parallel Hash Join (cost=9.11..104.03 rows=5108 width=16) (actual tim
e=0.303..0.314 rows=1 loops=1)
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on t1 @"lt#1" (cost=0.00..0.00 rows=729 width
=8) (actual time=0.003..0.004 rows=1 loops=1)
-> Parallel Hash (cost=0.00..0.00 rows=729 width=8) (actual time=0
.225..0.226 rows=1 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 64kB
-> Parallel Seq Scan on t2 @"lt#1" (cost=0.00..0.00 rows=729
width=8) (actual time=0.213..0.220 rows=1 loops=1)
-> Parallel Hash Join (cost=9.11..104.03 rows=5108 width=16) (actual tim
e=0.867..0.892 rows=1 loops=1)
Hash Cond: (t1_1.id = t3.id)
-> Parallel Seq Scan on t1 t1_1 @"lt#0" (cost=0.00..0.00 rows=729
width=8) (actual time=0.262..0.263 rows=1 loops=1)
-> Parallel Hash (cost=0.00..0.00 rows=729 width=8) (actual time=0
.386..0.386 rows=1 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 64kB
-> Parallel Seq Scan on t3 @"lt#0" (cost=0.00..0.00 rows=729
width=8) (actual time=0.367..0.368 rows=1 loops=1)
Planning Time: 2.472 ms
Execution Time: 11.057 ms
(18 rows)
I have spent some time studying the various patterns as well as the influence of parallel_aware here, and I am unclear what you claim is a bug here. Could you send a patch for what you think is right?
I am aware that the parallel path generation is rather broken currently with its dependency with GUCs, but it seems like what you are doing here makes things worse, while it is rather unclear why it would be better,
I am very sorry for troubling you. My understanding of this part is limited at present. I just said my current thoughts.
The change is just to remove 'spath->parallel_aware'
- if (spath->parallel_aware &&
- parallel_workers < spath->parallel_workers)
+ if (parallel_workers < spath->parallel_workers)
parallel_workers = spath->parallel_workers;
I am not sure remove 'spath->parallel_aware' is the right way , but after remove it, subquery can actually to be paralleled(Workers Planned is not 1).
I find spath->parallel_aware is false in this situation(explain (analyze) (select /*+parallel(t1 5) parallel(t3 5)*/* from t1,t3 where t1.id=t3.id) union all (select /*+parallel(t1 5) parallel(t2 5)*/* from t1,t2 where t1.id=t2.id);
), and then parallel_workers will always be 1.
I'm not particularly clear on the usefulness of parallel_aware yet, but I've discovered parallel_aware is not used in add_paths_to_append_rel to find the highest number of workers requested for any partial subpath:
/*
* Find the highest number of workers requested for any partial
* subpath.
*/
foreach(lc, pa_partial_subpaths)
{
Path *path = lfirst(lc);
parallel_workers = Max(parallel_workers, path->parallel_workers);
}
I also encountered a similar problem. The problem here is that using an arbitrary hint will cause the UNION ALL plan to be forced to run in parallel, which may cause serious performance losses (for example, non-parallel NestLoop + parameterized IndexScan is more good plan). Perhaps we need to add whether parallel hint exists. Refer to the logic similar to phint->force_parallel below.
+1 on parallel append unnecessarily enforced when arbitrary hint is present.
Also, that line from a snippet above
int parallel_workers = 0;
May cause Workers Planned
to be zero, like here:
postgres=# explain analyze SELECT 'l' UNION ALL (SELECT 'g') ORDER BY 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Sort (cost=0.04..0.05 rows=2 width=32) (actual time=0.023..0.026 rows=2 loops=1)
Sort Key: ('l'::text)
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.00..0.03 rows=2 width=32) (actual time=0.004..0.008 rows=2 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.003..0.004 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 0.155 ms
Execution Time: 0.068 ms
(8 rows)
postgres=# /*+ Set(enable_seqscan false) */ explain analyze SELECT 'l' UNION ALL (SELECT 'g') ORDER BY 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Gather Merge (cost=1000.02..1000.03 rows=0 width=32) (actual time=0.016..0.019 rows=2 loops=1)
Workers Planned: 0
Workers Launched: 0
-> Sort (cost=0.03..0.03 rows=1 width=32) (actual time=0.015..0.016 rows=2 loops=1)
Sort Key: ('l'::text)
Sort Method: quicksort Memory: 25kB
-> Parallel Append (cost=0.00..0.01 rows=1 width=32) (actual time=0.004..0.007 rows=2 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.003..0.003 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 0.149 ms
Execution Time: 0.065 ms
(11 rows)
It works OK in release build, but actually is asserted against:
https://github.com/postgres/postgres/blob/ec07d0d7fae9ac990061bef1aa8995f842839a40/src/backend/optimizer/util/pathnode.c#L1254
I think it would be safer to initialize it like this:
int parallel_workers = path->parallel_workers;