cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

orca: implement intra-segment parallel table scan support

Open yjhjstz opened this issue 2 months ago • 4 comments

Add comprehensive parallel table scan capability to GPORCA optimizer, enabling worker-level parallelism within segments for improved query performance on large table scans.

Key components:

  • New CPhysicalParallelTableScan operator and CDistributionSpecWorkerRandom distribution specification for worker-level data distribution
  • CXformGet2ParallelTableScan transformation with parallel safety checks (excludes CTEs, dynamic scans, foreign tables, replicated tables, etc.)
  • Cost model integration with parallel_setup_cost and efficiency degradation scaling (logarithmic based on worker count)
  • DXL serialization/deserialization for CDXLPhysicalParallelTableScan
  • Plan translation to PostgreSQL SeqScan nodes with parallel_aware=true
  • Rewindability constraints (parallel scans are non-rewindable)
  • GUC integration: max_parallel_workers_per_gather controls worker count

Impl https://github.com/apache/cloudberry/discussions/1316

Bench TPCH 10GB

Query ID Parallel Duration (s) Non-parallel Duration (s) Performance Improvement (s) Performance Improvement Rate
01 16.000000 30.000000 14.00 46.67%
02 3.000000 4.000000 1.00 25.00%
03 11.000000 20.000000 9.00 45.00%
04 7.000000 15.000000 8.00 53.33%
05 8.000000 8.000000 0.00 0.00%
06 3.000000 5.000000 2.00 40.00%
07 5.000000 8.000000 3.00 37.50%
08 6.000000 8.000000 2.00 25.00%
09 10.000000 15.000000 5.00 33.33%
10 6.000000 7.000000 1.00 14.29%
11 1.000000 2.000000 1.00 50.00%
12 5.000000 7.000000 2.00 28.57%
13 4.000000 6.000000 2.00 33.33%
14 3.000000 5.000000 2.00 40.00%
15 5.000000 5.000000 0.00 0.00%
16 2.000000 1.000000 -1.00 --------
17 34.000000 62.000000 28.00 45.16%
18 22.000000 28.000000 6.00 21.43%
19 3.000000 5.000000 2.00 40.00%
20 6.000000 11.000000 5.00 45.45%
21 22.000000 25.000000 3.00 12.00%
22 4.000000 6.000000 2.00 33.33%

Conclusion: With parallel execution, the TPCH queries' total execution time decreased from 284 seconds to 186 seconds, saving 98 seconds in total, with a performance improvement of 34.51%.

What does this PR do?

Type of Change

  • [ ] Bug fix (non-breaking change)
  • [ ] New feature (non-breaking change)
  • [ ] Breaking change (fix or feature with breaking changes)
  • [ ] Documentation update

Breaking Changes

Test Plan

  • [ ] Unit tests added/updated
  • [ ] Integration tests added/updated
  • [ ] Passed make installcheck
  • [ ] Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


yjhjstz avatar Oct 16 '25 16:10 yjhjstz

Please add more test cases

my-ship-it avatar Oct 21 '25 08:10 my-ship-it

Please add more test cases

see src/test/regress:installcheck-orca-parallel

yjhjstz avatar Oct 21 '25 14:10 yjhjstz

Add some cases to test the plan?

avamingli avatar Oct 24 '25 02:10 avamingli

Add some cases to test the plan?

maybe after impl parallel hash join .

yjhjstz avatar Oct 24 '25 03:10 yjhjstz