gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

[6X] Fix cross-slice interaction detection for subplans

Open InnerLife0 opened this issue 2 years ago • 8 comments

GPDB has Shared Scan node to share the output of a subplan. Each Shared Scan node can act like producer or consumer. Each one may be a part of different slice. One of essential parts of Shared Scan processing is cross-slice interaction detection. It helps to find the difference in producer's and consumer's slices and mark underlying Material or Sort nodes as cross-slice. Cross-slice producer reads all the tuples from child plan before the consumer Shared Scan can read it.

Before this patch, cross-slice interaction detection of underlying subplan nodes was broken. We walked through subplans separately from main plan. This, in case subplan has no motion, caused shareinput_mutator_xslice_2() to think we don't need to mark node as cross-slice - motion stack (motStack) used for cross-slice interaction detection contained only default motion id (0) which equals to producer's slice id.

Here is the simplified part of new regression test. It shows Shared Scan consumer under Subplan, which has different (compared to producer upper) slice id.

 Sequence
   ->  Shared Scan (share slice:id 0:0)
     ->  Seq Scan on t1 t1_1
   ->  Gather Motion 3:1  (slice3; segments: 3)
     ->  Redistribute Motion 1:3  (slice1)
       ->  Function Scan on generate_series
         SubPlan 1  (slice1)
           ->  Shared Scan (share slice:id 1:0)

Before this patch, we didn't have upper motions in motStack. From now, we dive into subplans as parts of main plan according to the plan tree hierarchy. Cross-slice detector understands that our slice(1) is different from producer's slice(0) and marks node as cross-slice.

Existing logic and comments around rtables left almost unchanged. Existing tricky logic around joins execution order changed to another tricky logic which is compatible with plan_tree_walker(). gporca_optimizer.out test output changed as we now start shareinput_mutator_dag_to_tree() from the main plan. Before, we started from subplans and global share input context contained only 3 producers at the time set_plan_share_id() called. Now, at the same place producers count is equal to 4, so RangeTblEntry's name changed.


This parts of code was heavily reworked in master branch and there no similar error detected. Porting the same for 6X will require a lot of changes and possibly moving and porting of dependencies related to master branch only.

InnerLife0 avatar Jun 27 '22 18:06 InnerLife0

Thanks for your contribution, InnerLife0. I am reading your PR and will try to understand and review it. 👍

interma avatar Jun 29 '22 02:06 interma

@InnerLife0 I have two naive questions, just to confirm:

  1. I have run the two test cases in shared_scan.sql on master code, runs good.

So, does it mean that the master have fixed the issue? Or just "circumvent" the issue?

  1. you mentioned:

This parts of code was heavily reworked in master branch and there no similar error detected. Porting the same for 6X will require a lot of changes and possibly moving and porting of dependencies related to master branch only.

If the answer 1 is "master fixed this issue", do you mean port its implement to 6x is very hard, so you opened this (simpler) PR to fix it on 6x?

Thanks.

interma avatar Jul 07 '22 03:07 interma

The issue is fixed in master and it's hard to rip off the fix from it as it depends on other code changes. More, it seems to be over-complicated and should be simpler.

InnerLife0 avatar Jul 07 '22 09:07 InnerLife0

@Stolb27 @InnerLife0 I have finished my review, please add more comments to code (see my previous review suggestions). Thanks for your contribution!

And just hold on a few days, other guys may take a look, thanks.

interma avatar Aug 09 '22 10:08 interma

And this PR may be a suitable solution for https://github.com/greenplum-db/gpdb/pull/12730, I will verify it later.

interma avatar Aug 09 '22 10:08 interma

I have finished my review, please add more comments to code (see my previous review suggestions).

@interma, thank you, applied.

Stolb27 avatar Aug 09 '22 11:08 Stolb27

Thanks so much for the great work!

I will start to look into this, please give me some time. I'll try my best to go through each detail.

kainwen avatar Aug 09 '22 12:08 kainwen

And this PR may be a suitable solution for #12730, I will verify it later.

Apply the PR and revert #12730, run the repro query in https://github.com/greenplum-db/gpdb/issues/12701

No Assert error of panic, but still get an error: ERROR: could not find shared input node with id 0 (cdbmutate.c:2467)

Still need to improve it ( or make gp_cte_sharing to take effect to let user skip SISC for this query).

interma avatar Aug 10 '22 02:08 interma

I have rebased this PR with latest 6X and run a full local pipeline, it is green. I'll push this now.

Thanks a lot for this contribution!

kainwen avatar Sep 20 '22 06:09 kainwen