cloudberry Cherry pick NodeSummary in EXPLAIN ANALYZE

Cherry pick of https://github.com/robozmey/gpdb/commit/4b05cbf796eac2d40f6f8233c3a6e6f1144458ee

Dumps content of CdbExplain_NodeSummary into Node descriprion in EXPLAIN ANALYZE

drop table if exists tt; create table tt (a int, b int) with(parallel_workers=2) distributed by (a);

insert into tt select * from generate_series(1,1000)a,generate_series(1,1000)b;

set enable_parallel = on;

set max_parallel_workers_per_gather = 2;

set gp_enable_explain_node_summary=on;

explain (analyze) select * from tt where a > b;

                                                                                                             QUERY PLAN                                           
                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
 Gather Motion 6:1  (slice1; segments: 6)  (cost=0.00..548.21 rows=28700 width=8) (actual time=1.000..53.000 rows=499500 loops=1)
   Node Summary:                  vmax       vsum       vcnt       imax
     ntuples:                   499500     499500          1         -1
     nloops:                         1          1          1         -1
     execmemused:                    0          0          0          0
     workmemused:                    0          0          0          0
     workmemwanted:                  0          0          0          0
     totalWorkfileCreated:           0          0          0          0
     totalPartTableScanned:          0          0          0          0
     segindex0: -1
     ninst: 1
     StatInsts:
       (segN) pstype starttime counter firsttuple startup total ntuples ntuples2 nloops nfiltered1 nfiltered2 execmemused workmemused workmemwanted workfileCreate
d firststart numPartScanned bnotes enotes, nworkers_launched
       (seg-1) 138 0 0 0.00 0.00 0.05 499500 0 1 0 0 0 0 0 0 1798033 0 0 0 0
   ->  Parallel Seq Scan on tt  (cost=0.00..213.38 rows=4783 width=8) (actual time=0.000..15.000 rows=85736 loops=1)
         Filter: (a < b)
         Rows Removed by Filter: 82281
         Node Summary:                  vmax       vsum       vcnt       imax
           ntuples:                    85736     499500          6          0
           nloops:                         1          6          6          0
           execmemused:                    0          0          0          0
           workmemused:                    0          0          0          0
           workmemwanted:                  0          0          0          0
           totalWorkfileCreated:           0          0          0          0
           totalPartTableScanned:          0          0          0          0
           segindex0: 0
           ninst: 3
           StatInsts:
             (segN) pstype starttime counter firsttuple startup total ntuples ntuples2 nloops nfiltered1 nfiltered2 execmemused workmemused workmemwanted workfile
Created firststart numPartScanned bnotes enotes, nworkers_launched
             (seg0) 95 0 0 0.00 0.00 0.01 85736 0 1 82281 0 0 0 0 0 1798033 0 0 0 0
             (seg1) 95 0 0 0.00 0.00 0.01 80303 0 1 79895 0 0 0 0 0 1798033 0 0 0 0
             (seg2) 95 0 0 0.00 0.00 0.02 83655 0 1 86362 0 0 0 0 0 1798033 0 0 0 0
 Planning Time: 0.313 ms
   (slice0)    Executor memory: 111K bytes.
   (slice1)    Executor memory: 111K bytes avg x 6x(0) workers, 112K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 62.953 ms
(38 rows)

Fixes #ISSUE_Number

What does this PR do?

Type of Change

[ ] Bug fix (non-breaking change)
[ ] New feature (non-breaking change)
[ ] Breaking change (fix or feature with breaking changes)
[ ] Documentation update

Breaking Changes

Test Plan

[ ] Unit tests added/updated
[ ] Integration tests added/updated
[ ] Passed make installcheck
[ ] Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

[ ] Followed contribution guide
[ ] Added/updated documentation
[ ] Reviewed code for security implications
[ ] Requested review from cloudberry committers

Additional Context

CI Skip Instructions

May 20 '25 08:05 robozmey

Hi, thanks for test parallel plan. Could you add the parallel cases into test too?

Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..548.21 rows=28700 width=8) (actual time=1.000..53.000 rows=499500 loops=1) Node Summary: vmax vsum vcnt imax ntuples: 499500 499500 1 -1 nloops: 1 1 1 -1 execmemused: 0 0 0 0 workmemused: 0 0 0 0 workmemwanted: 0 0 0 0 totalWorkfileCreated: 0 0 0 0 totalPartTableScanned: 0 0 0 0 segindex0: -1 ninst: 1 StatInsts: (segN) pstype starttime counter firsttuple startup total ntuples ntuples2 nloops nfiltered1 nfiltered2 execmemused workmemused workmemwanted workfileCreate d firststart numPartScanned bnotes enotes, nworkers_launched (seg-1) 138 0 0 0.00 0.00 0.05 499500 0 1 0 0 0 0 0 0 1798033 0 0 0 0 -> Parallel Seq Scan

I guess the plan is 2-parallel workers on a 3-segment cluster, is it? Then the nworkers_launched at the last field of (seg-1) 138 0 0 0.00 0.00 0.05 499500 0 1 0 0 0 0 0 0 1798033 0 0 0 0 should be 2 but not 0?

May 23 '25 06:05 avamingli

And a quick question: In what cases is NodeSummary primarily used?

It useful to debug plans and queries, with this feature you can just see stats of every Node without need to use GDB to look inside CdbExplain_NodeSummary structure

Jun 11 '25 08:06 robozmey

And a quick question: In what cases is NodeSummary primarily used?

It useful to debug plans and queries, with this feature you can just see stats of every Node without need to use GDB to look inside CdbExplain_NodeSummary structure

Sounds good, thank you for the explanation.

Jun 11 '25 09:06 avamingli

Hi, thanks for test parallel plan. Could you add the parallel cases into test too?

Gather Motion 6:1 (slice1; segments: 6) (cost=0.00..548.21 rows=28700 width=8) (actual time=1.000..53.000 rows=499500 loops=1) Node Summary: vmax vsum vcnt imax ntuples: 499500 499500 1 -1 nloops: 1 1 1 -1 execmemused: 0 0 0 0 workmemused: 0 0 0 0 workmemwanted: 0 0 0 0 totalWorkfileCreated: 0 0 0 0 totalPartTableScanned: 0 0 0 0 segindex0: -1 ninst: 1 StatInsts: (segN) pstype starttime counter firsttuple startup total ntuples ntuples2 nloops nfiltered1 nfiltered2 execmemused workmemused workmemwanted workfileCreate d firststart numPartScanned bnotes enotes, nworkers_launched (seg-1) 138 0 0 0.00 0.00 0.05 499500 0 1 0 0 0 0 0 0 1798033 0 0 0 0 -> Parallel Seq Scan

I guess the plan is 2-parallel workers on a 3-segment cluster, is it? Then the nworkers_launched at the last field of (seg-1) 138 0 0 0.00 0.00 0.05 499500 0 1 0 0 0 0 0 0 1798033 0 0 0 0 should be 2 but not 0?

LGTM except this one.

Jun 18 '25 13:06 avamingli