gpdb
gpdb copied to clipboard
[6X] Fix cross-slice interaction detection for subplans
GPDB has Shared Scan node to share the output of a subplan. Each Shared Scan node can act like producer or consumer. Each one may be a part of different slice. One of essential parts of Shared Scan processing is cross-slice interaction detection. It helps to find the difference in producer's and consumer's slices and mark underlying Material or Sort nodes as cross-slice. Cross-slice producer reads all the tuples from child plan before the consumer Shared Scan can read it.
Before this patch, cross-slice interaction detection of underlying subplan nodes was broken. We walked through subplans separately from main plan. This, in case subplan has no motion, caused shareinput_mutator_xslice_2()
to think we don't need to mark node as cross-slice - motion stack (motStack
) used for cross-slice interaction detection contained only default motion id (0) which equals to producer's slice id.
Here is the simplified part of new regression test. It shows Shared Scan consumer under Subplan, which has different (compared to producer upper) slice id.
Sequence
-> Shared Scan (share slice:id 0:0)
-> Seq Scan on t1 t1_1
-> Gather Motion 3:1 (slice3; segments: 3)
-> Redistribute Motion 1:3 (slice1)
-> Function Scan on generate_series
SubPlan 1 (slice1)
-> Shared Scan (share slice:id 1:0)
Before this patch, we didn't have upper motions in motStack
. From now, we dive into subplans as parts of main plan according to the plan tree hierarchy. Cross-slice detector understands that our slice(1) is different from producer's slice(0) and marks node as cross-slice.
Existing logic and comments around rtables left almost unchanged.
Existing tricky logic around joins execution order changed to another tricky logic which is compatible with plan_tree_walker()
.
gporca_optimizer.out
test output changed as we now start shareinput_mutator_dag_to_tree()
from the main plan. Before, we started from subplans and global share input context contained only 3 producers at the time set_plan_share_id()
called. Now, at the same place producers count is equal to 4, so RangeTblEntry's name changed.
This parts of code was heavily reworked in master branch and there no similar error detected. Porting the same for 6X will require a lot of changes and possibly moving and porting of dependencies related to master branch only.
Thanks for your contribution, InnerLife0. I am reading your PR and will try to understand and review it. 👍
@InnerLife0 I have two naive questions, just to confirm:
- I have run the two test cases in shared_scan.sql on master code, runs good.
So, does it mean that the master have fixed the issue? Or just "circumvent" the issue?
- you mentioned:
This parts of code was heavily reworked in master branch and there no similar error detected. Porting the same for 6X will require a lot of changes and possibly moving and porting of dependencies related to master branch only.
If the answer 1 is "master fixed this issue", do you mean port its implement to 6x is very hard, so you opened this (simpler) PR to fix it on 6x?
Thanks.
The issue is fixed in master and it's hard to rip off the fix from it as it depends on other code changes. More, it seems to be over-complicated and should be simpler.
@Stolb27 @InnerLife0 I have finished my review, please add more comments to code (see my previous review suggestions). Thanks for your contribution!
And just hold on a few days, other guys may take a look, thanks.
And this PR may be a suitable solution for https://github.com/greenplum-db/gpdb/pull/12730, I will verify it later.
I have finished my review, please add more comments to code (see my previous review suggestions).
@interma, thank you, applied.
Thanks so much for the great work!
I will start to look into this, please give me some time. I'll try my best to go through each detail.
And this PR may be a suitable solution for #12730, I will verify it later.
Apply the PR and revert #12730, run the repro query in https://github.com/greenplum-db/gpdb/issues/12701
No Assert error of panic, but still get an error:
ERROR: could not find shared input node with id 0 (cdbmutate.c:2467)
Still need to improve it ( or make gp_cte_sharing
to take effect to let user skip SISC for this query).
I have rebased this PR with latest 6X and run a full local pipeline, it is green. I'll push this now.
Thanks a lot for this contribution!