starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] reading predicate column by late materialization and sort predicate column according to predicate selectivity

Open before-Sunrise opened this issue 2 months ago • 28 comments

Why I'm doing:

  1. optimize our internal table scan'late materialization's implementation, so selected rows in the same page can be handled in one function call
  2. support reading predicate column by late materialization, if predicate columns' number is huge, and the predicates in front can filter lots of data, this can reduce io and memory copy
  3. since we don't need reading predicate columns all at once, we can reorder the predicate columns so the predicate with lower selectivity can execute first. This is done by sample data and check the selectivity of every predicate
  4. if the first predicate column is string, push down the string predicate into page level, so we don't need to read big string into column then filter it if not satisfied predicate. This is one implementation of zero-copy, since our current zero-copy doesn't support string type.
  5. optimize predicate evaluation speed for: string_col != "", only check the offset column.

What I'm doing:

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
  • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3

[!NOTE] Implements late materialization for predicate columns with selectivity-based ordering and page-level string predicate pushdown, backed by zero-copy binary support and end-to-end plumbing across readers/iterators/config.

  • Storage/Scan Pipeline:
    • Implement late materialization for predicate columns in SegmentIterator with dynamic predicate ordering via sampling/selectivity and per-column runtime filters.
    • Push down string predicates to page decoders; filter at page level and read only selected rows.
    • Add support to read by rowids and filtered batches: ScalarColumnIterator, ParsedPage, PageDecoder (next_batch_with_filter, read_by_rowids), BitShufflePageDecoder, BinaryPlainPageDecoder, BinaryDictPageDecoder.
    • Wire option enable_predicate_col_late_materialize through SegmentReadOptions, tablet/lake readers, pipeline scan.
  • Predicate Engine:
    • Add compound_and_predicates_evaluate(...) to evaluate AND-predicates vectorized/branchless.
    • Optimize binary != '' predicate path.
    • PredicateTree: expose has_or_predicate().
  • Binary/Column Infrastructure:
    • Introduce zero-copy BinaryColumn view via ContainerResource, get_immutable_bytes(), get_string_begin/end(), and on-demand materialization; adjust usages across codebase.
    • Add append_with_mask utility and use in aggregators/decoders for masked appends.
    • Extend Chunk to append by ColumnId.
  • Config/FE API:
    • Add BE config tigger_sample_selectivity; change late_materialization_ratio to mutable.
    • Add FE session var enable_predicate_col_late_materialize and plumb to thrift (TQueryOptions).
  • Misc/Perf:
    • Reserve/estimate buffer sizes in decoders; pruning bookkeeping by ColumnId.
  • Tests:
    • Add SQL test test_scan_predicate_late_materialization covering v1/v2 pages, nullability, and correctness.

Written by Cursor Bugbot for commit f1fbc7671c02c0f30e0baf37aced1b60fd05eafd. This will update automatically on new commits. Configure here.

before-Sunrise avatar Oct 27 '25 06:10 before-Sunrise

@cursor review

alvin-celerdata avatar Oct 27 '25 16:10 alvin-celerdata

🧪 CI Insights

Here's what we observed from your CI run for d8089ab0.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Oct 29 '25 09:10 mergify[bot]

@cursor review

alvin-celerdata avatar Nov 06 '25 21:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 07 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 10 '25 18:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 11 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 17 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 18 '25 04:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 18 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 19 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 25 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 26 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 27 '25 15:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Nov 28 '25 17:11 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 01 '25 19:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 02 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 04 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 05 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 09 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 04:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 06:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 10 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 11 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 12 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 15 '25 17:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 16 '25 05:12 alvin-celerdata

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 16 '25 13:12 github-actions[bot]

[FE Incremental Coverage Report]

:x: fail : 1532 / 1953 (78.44%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/server/SharedDataStorageVolumeMgr.java 0 60 00.00% [67, 111, 113, 115, 116, 117, 118, 119, 122, 123, 124, 126, 127, 129, 130, 131, 132, 133, 135, 161, 162, 165, 166, 167, 168, 681, 682, 683, 684, 686, 687, 688, 689, 691, 695, 696, 697, 703, 704, 705, 706, 709, 710, 711, 714, 716, 717, 719, 721, 722, 724, 726, 728, 729, 732, 733, 734, 736, 737, 739]
:large_blue_circle: com/starrocks/sql/optimizer/operator/OperatorVisitor.java 0 2 00.00% [447, 451]
:large_blue_circle: com/starrocks/server/GlobalStateMgr.java 0 1 00.00% [1466]
:large_blue_circle: com/starrocks/sql/common/DebugOperatorTracer.java 0 24 00.00% [575, 576, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603]
:large_blue_circle: com/starrocks/server/SharedNothingStorageVolumeMgr.java 0 2 00.00% [177, 182]
:large_blue_circle: com/starrocks/meta/SqlBlackList.java 0 1 00.00% [67]
:large_blue_circle: com/starrocks/rpc/LakeServiceWithMetrics.java 0 2 00.00% [193, 194]
:large_blue_circle: com/starrocks/server/StorageVolumeMgr.java 0 5 00.00% [267, 268, 271, 272, 288]
:large_blue_circle: com/starrocks/lake/StarMgrMetaSyncer.java 1 4 25.00% [262, 263, 264]
:large_blue_circle: com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java 2 7 28.57% [87, 88, 89, 90, 91]
:large_blue_circle: com/starrocks/planner/NoopSink.java 2 6 33.33% [30, 35, 40, 45]
:large_blue_circle: com/starrocks/sql/optimizer/MvRewritePreprocessor.java 5 13 38.46% [203, 204, 205, 206, 207, 209, 210, 211]
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalFetchOperator.java 15 27 55.56% [54, 58, 78, 88, 89, 91, 92, 94, 95, 96, 97, 98]
:large_blue_circle: com/starrocks/planner/LookUpNode.java 9 16 56.25% [55, 56, 57, 58, 59, 60, 61]
:large_blue_circle: com/starrocks/planner/FetchNode.java 30 51 58.82% [59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 72, 88, 89, 90, 91, 92, 93, 106, 107, 116, 121]
:large_blue_circle: com/starrocks/replication/ReplicationMgr.java 3 5 60.00% [69, 70]
:large_blue_circle: com/starrocks/system/SystemInfoService.java 5 8 62.50% [1540, 1541, 1542]
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalLookUpOperator.java 18 29 62.07% [91, 101, 102, 104, 105, 107, 108, 109, 110, 111, 112]
:large_blue_circle: com/starrocks/replication/LakeReplicationJob.java 47 72 65.28% [52, 53, 54, 55, 56, 57, 58, 69, 71, 72, 73, 79, 80, 81, 82, 83, 97, 98, 123, 124, 125, 126, 127, 128, 129]
:large_blue_circle: com/starrocks/sql/optimizer/LateMaterializationRewriter.java 356 503 70.78% [162, 241, 244, 247, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 371, 390, 391, 401, 424, 425, 426, 427, 428, 430, 440, 441, 474, 475, 502, 503, 505, 506, 507, 508, 510, 511, 543, 544, 545, 547, 550, 552, 553, 554, 555, 556, 558, 559, 564, 565, 566, 568, 570, 571, 572, 573, 574, 576, 578, 583, 584, 585, 586, 591, 592, 593, 594, 596, 597, 598, 600, 601, 606, 607, 608, 610, 612, 614, 615, 616, 617, 618, 620, 621, 622, 627, 628, 629, 630, 631, 632, 633, 635, 636, 687, 688, 689, 690, 691, 692, 694, 696, 699, 700, 702, 704, 706, 707, 708, 709, 711, 713, 715, 716, 717, 718, 719, 720, 721, 723, 814, 815, 816, 817, 818, 819, 821, 822, 867, 868, 882]
:large_blue_circle: com/starrocks/connector/hive/HiveUtils.java 61 84 72.62% [42, 59, 60, 62, 86, 87, 88, 89, 90, 124, 133, 134, 137, 150, 151, 163, 166, 167, 168, 187, 188, 189, 190]
:large_blue_circle: com/starrocks/replication/ReplicationJob.java 11 15 73.33% [682, 683, 729, 730]
:large_blue_circle: com/starrocks/qe/scheduler/Deployer.java 51 68 75.00% [334, 335, 336, 363, 368, 373, 378, 379, 382, 384, 385, 387, 389, 444, 449, 454, 467]
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/PruneHDFSScanColumnRule.java 3 4 75.00% [175]
:large_blue_circle: com/starrocks/storagevolume/StorageVolume.java 31 38 81.58% [334, 335, 340, 341, 342, 348, 351]
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalJoinOperator.java 5 6 83.33% [81]
:large_blue_circle: com/starrocks/qe/scheduler/assignment/RemoteFragmentAssignmentStrategy.java 21 24 87.50% [105, 132, 143]
:large_blue_circle: com/starrocks/catalog/mv/MVTimelinessArbiter.java 24 27 88.89% [110, 111, 120]
:large_blue_circle: com/starrocks/sql/optimizer/CachingMvPlanContextBuilder.java 15 17 88.24% [103, 104]
:large_blue_circle: com/starrocks/sql/common/PCellWithName.java 35 39 89.74% [93, 144, 153, 156]
:large_blue_circle: com/starrocks/rpc/BackendServiceClient.java 10 11 90.91% [103]
:large_blue_circle: com/starrocks/qe/SessionVariable.java 19 20 95.00% [5660]
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/scalar/ImplicitCastRule.java 22 23 95.65% [122]
:large_blue_circle: com/starrocks/system/BackendResourceStat.java 146 151 96.69% [101, 102, 103, 104, 105]
:large_blue_circle: com/starrocks/sql/analyzer/ExpressionAnalyzer.java 42 43 97.67% [315]
:large_blue_circle: com/starrocks/sql/plan/PlanFragmentBuilder.java 88 90 97.78% [2549, 4455]
:large_blue_circle: com/starrocks/scheduler/mv/pct/MVPCTRefreshListPartitioner.java 1 1 100.00% []
:large_blue_circle: com/starrocks/lake/TabletRepairHelper.java 58 58 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/scalar/ScalarOperator.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/QueryOptimizer.java 3 3 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/dag/FragmentInstanceExecState.java 4 4 100.00% []
:large_blue_circle: com/starrocks/connector/RemoteFileOperations.java 3 3 100.00% []
:large_blue_circle: com/starrocks/catalog/IcebergTable.java 2 2 100.00% []
:large_blue_circle: com/starrocks/task/ReplicateSnapshotTask.java 30 30 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/slot/QueryQueueOptions.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/DeletePlanner.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/OptExpression.java 6 6 100.00% []
:large_blue_circle: com/starrocks/common/Config.java 3 3 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/JsonPathRewriteRule.java 5 5 100.00% []
:large_blue_circle: com/starrocks/connector/partitiontraits/HivePartitionTraits.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/ListPartitionPruner.java 15 15 100.00% []
:large_blue_circle: com/starrocks/sql/common/RangePartitionDiffer.java 20 20 100.00% []
:large_blue_circle: com/starrocks/catalog/MvRefreshArbiter.java 16 16 100.00% []
:large_blue_circle: com/starrocks/qe/StmtExecutor.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/UpdatePlanner.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/scalar/FoldConstantsRule.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/ResourceGroupMgr.java 2 2 100.00% []
:large_blue_circle: com/starrocks/catalog/mv/MVTimelinessListPartitionArbiter.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/dag/JobSpec.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/slot/PipelineDriverAllocator.java 18 18 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpSerializer.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/Column.java 1 1 100.00% []
:large_blue_circle: com/starrocks/qe/DefaultCoordinator.java 9 9 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/LogicalPlanPrinter.java 15 15 100.00% []
:large_blue_circle: com/starrocks/planner/PlanFragment.java 5 5 100.00% []
:large_blue_circle: com/starrocks/planner/HiveTableSink.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/AddIndexOnlyPredicateRule.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/OperatorType.java 2 2 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/MVTimelinessMgr.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/common/PartitionDiffer.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/InsertPlanner.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/exprreuse/ScalarOperatorsReuse.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpDeserializer.java 22 22 100.00% []
:large_blue_circle: com/starrocks/catalog/mv/MVTimelinessNonPartitionArbiter.java 2 2 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/pct/MVPCTRefreshNonPartitioner.java 1 1 100.00% []
:large_blue_circle: com/starrocks/qe/GlobalVariable.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/QueryMaterializationContext.java 1 1 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/pct/MVPCTRefreshPartitioner.java 3 3 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/materialization/rule/TextMatchBasedRewriteRule.java 3 3 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/ResourceGroupAnalyzer.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/HiveTable.java 8 8 100.00% []
:large_blue_circle: com/starrocks/planner/expression/ExprToThrift.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/OptimizerContext.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/cost/HashJoinCostModel.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/ScanOperatorPredicates.java 7 7 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/dag/ExecutionDAG.java 1 1 100.00% []
:large_blue_circle: com/starrocks/lake/vacuum/FullVacuumDaemon.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/OptExternalPartitionPruner.java 1 1 100.00% []
:large_blue_circle: com/starrocks/rpc/PExecBatchPlanFragmentsRequest.java 2 2 100.00% []
:large_blue_circle: com/starrocks/system/ComputeNode.java 4 4 100.00% []
:large_blue_circle: com/starrocks/statistic/StatisticUtils.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/LoadPlanner.java 7 7 100.00% []
:large_blue_circle: com/starrocks/connector/hive/PartitionUpdate.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpInfo.java 22 22 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/dag/SingleNodeSchedule.java 10 10 100.00% []
:large_blue_circle: com/starrocks/persist/gson/GsonUtils.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalTopNOperator.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/MaterializedViewTransparentRewriteRule.java 3 3 100.00% []
:large_blue_circle: com/starrocks/catalog/mv/MVTimelinessRangePartitionArbiter.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/assignment/FragmentAssignmentStrategyFactory.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/PolymorphicFunctionAnalyzer.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/OptExpressionVisitor.java 2 2 100.00% []
:large_blue_circle: com/starrocks/scheduler/TaskRun.java 3 3 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/pct/MVPCTRefreshRangePartitioner.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/common/ListPartitionDiffer.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/cost/feature/FeatureExtractor.java 7 7 100.00% []
:large_blue_circle: com/starrocks/lake/StarOSAgent.java 40 40 100.00% []

github-actions[bot] avatar Dec 16 '25 13:12 github-actions[bot]