[Enhancement] reading predicate column by late materialization and sort predicate column according to predicate selectivity
Why I'm doing:
- optimize our internal table scan'late materialization's implementation, so selected rows in the same page can be handled in one function call
- support reading predicate column by late materialization, if predicate columns' number is huge, and the predicates in front can filter lots of data, this can reduce io and memory copy
- since we don't need reading predicate columns all at once, we can reorder the predicate columns so the predicate with lower selectivity can execute first. This is done by sample data and check the selectivity of every predicate
- if the first predicate column is string, push down the string predicate into page level, so we don't need to read big string into column then filter it if not satisfied predicate. This is one implementation of zero-copy, since our current zero-copy doesn't support string type.
- optimize predicate evaluation speed for: string_col != "", only check the offset column.
What I'm doing:
Fixes #issue
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [ ] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [x] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Implements late materialization for predicate columns with selectivity-based ordering and page-level string predicate pushdown, backed by zero-copy binary support and end-to-end plumbing across readers/iterators/config.
- Storage/Scan Pipeline:
- Implement late materialization for predicate columns in
SegmentIteratorwith dynamic predicate ordering via sampling/selectivity and per-column runtime filters.- Push down string predicates to page decoders; filter at page level and read only selected rows.
- Add support to read by rowids and filtered batches:
ScalarColumnIterator,ParsedPage,PageDecoder(next_batch_with_filter,read_by_rowids),BitShufflePageDecoder,BinaryPlainPageDecoder,BinaryDictPageDecoder.- Wire option
enable_predicate_col_late_materializethroughSegmentReadOptions, tablet/lake readers, pipeline scan.- Predicate Engine:
- Add
compound_and_predicates_evaluate(...)to evaluate AND-predicates vectorized/branchless.- Optimize binary
!= ''predicate path.PredicateTree: exposehas_or_predicate().- Binary/Column Infrastructure:
- Introduce zero-copy
BinaryColumnview viaContainerResource,get_immutable_bytes(),get_string_begin/end(), and on-demand materialization; adjust usages across codebase.- Add
append_with_maskutility and use in aggregators/decoders for masked appends.- Extend
Chunkto append byColumnId.- Config/FE API:
- Add BE config
tigger_sample_selectivity; changelate_materialization_ratioto mutable.- Add FE session var
enable_predicate_col_late_materializeand plumb to thrift (TQueryOptions).- Misc/Perf:
- Reserve/estimate buffer sizes in decoders; pruning bookkeeping by
ColumnId.- Tests:
- Add SQL test
test_scan_predicate_late_materializationcovering v1/v2 pages, nullability, and correctness.Written by Cursor Bugbot for commit f1fbc7671c02c0f30e0baf37aced1b60fd05eafd. This will update automatically on new commits. Configure here.
@cursor review
🧪 CI Insights
Here's what we observed from your CI run for d8089ab0.
🟢 All jobs passed!
But CI Insights is watching 👀
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@cursor review
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:x: fail : 1532 / 1953 (78.44%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/server/SharedDataStorageVolumeMgr.java | 0 | 60 | 00.00% | [67, 111, 113, 115, 116, 117, 118, 119, 122, 123, 124, 126, 127, 129, 130, 131, 132, 133, 135, 161, 162, 165, 166, 167, 168, 681, 682, 683, 684, 686, 687, 688, 689, 691, 695, 696, 697, 703, 704, 705, 706, 709, 710, 711, 714, 716, 717, 719, 721, 722, 724, 726, 728, 729, 732, 733, 734, 736, 737, 739] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/OperatorVisitor.java | 0 | 2 | 00.00% | [447, 451] |
| :large_blue_circle: | com/starrocks/server/GlobalStateMgr.java | 0 | 1 | 00.00% | [1466] |
| :large_blue_circle: | com/starrocks/sql/common/DebugOperatorTracer.java | 0 | 24 | 00.00% | [575, 576, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603] |
| :large_blue_circle: | com/starrocks/server/SharedNothingStorageVolumeMgr.java | 0 | 2 | 00.00% | [177, 182] |
| :large_blue_circle: | com/starrocks/meta/SqlBlackList.java | 0 | 1 | 00.00% | [67] |
| :large_blue_circle: | com/starrocks/rpc/LakeServiceWithMetrics.java | 0 | 2 | 00.00% | [193, 194] |
| :large_blue_circle: | com/starrocks/server/StorageVolumeMgr.java | 0 | 5 | 00.00% | [267, 268, 271, 272, 288] |
| :large_blue_circle: | com/starrocks/lake/StarMgrMetaSyncer.java | 1 | 4 | 25.00% | [262, 263, 264] |
| :large_blue_circle: | com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java | 2 | 7 | 28.57% | [87, 88, 89, 90, 91] |
| :large_blue_circle: | com/starrocks/planner/NoopSink.java | 2 | 6 | 33.33% | [30, 35, 40, 45] |
| :large_blue_circle: | com/starrocks/sql/optimizer/MvRewritePreprocessor.java | 5 | 13 | 38.46% | [203, 204, 205, 206, 207, 209, 210, 211] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/physical/PhysicalFetchOperator.java | 15 | 27 | 55.56% | [54, 58, 78, 88, 89, 91, 92, 94, 95, 96, 97, 98] |
| :large_blue_circle: | com/starrocks/planner/LookUpNode.java | 9 | 16 | 56.25% | [55, 56, 57, 58, 59, 60, 61] |
| :large_blue_circle: | com/starrocks/planner/FetchNode.java | 30 | 51 | 58.82% | [59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 72, 88, 89, 90, 91, 92, 93, 106, 107, 116, 121] |
| :large_blue_circle: | com/starrocks/replication/ReplicationMgr.java | 3 | 5 | 60.00% | [69, 70] |
| :large_blue_circle: | com/starrocks/system/SystemInfoService.java | 5 | 8 | 62.50% | [1540, 1541, 1542] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/physical/PhysicalLookUpOperator.java | 18 | 29 | 62.07% | [91, 101, 102, 104, 105, 107, 108, 109, 110, 111, 112] |
| :large_blue_circle: | com/starrocks/replication/LakeReplicationJob.java | 47 | 72 | 65.28% | [52, 53, 54, 55, 56, 57, 58, 69, 71, 72, 73, 79, 80, 81, 82, 83, 97, 98, 123, 124, 125, 126, 127, 128, 129] |
| :large_blue_circle: | com/starrocks/sql/optimizer/LateMaterializationRewriter.java | 356 | 503 | 70.78% | [162, 241, 244, 247, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 371, 390, 391, 401, 424, 425, 426, 427, 428, 430, 440, 441, 474, 475, 502, 503, 505, 506, 507, 508, 510, 511, 543, 544, 545, 547, 550, 552, 553, 554, 555, 556, 558, 559, 564, 565, 566, 568, 570, 571, 572, 573, 574, 576, 578, 583, 584, 585, 586, 591, 592, 593, 594, 596, 597, 598, 600, 601, 606, 607, 608, 610, 612, 614, 615, 616, 617, 618, 620, 621, 622, 627, 628, 629, 630, 631, 632, 633, 635, 636, 687, 688, 689, 690, 691, 692, 694, 696, 699, 700, 702, 704, 706, 707, 708, 709, 711, 713, 715, 716, 717, 718, 719, 720, 721, 723, 814, 815, 816, 817, 818, 819, 821, 822, 867, 868, 882] |
| :large_blue_circle: | com/starrocks/connector/hive/HiveUtils.java | 61 | 84 | 72.62% | [42, 59, 60, 62, 86, 87, 88, 89, 90, 124, 133, 134, 137, 150, 151, 163, 166, 167, 168, 187, 188, 189, 190] |
| :large_blue_circle: | com/starrocks/replication/ReplicationJob.java | 11 | 15 | 73.33% | [682, 683, 729, 730] |
| :large_blue_circle: | com/starrocks/qe/scheduler/Deployer.java | 51 | 68 | 75.00% | [334, 335, 336, 363, 368, 373, 378, 379, 382, 384, 385, 387, 389, 444, 449, 454, 467] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/transformation/PruneHDFSScanColumnRule.java | 3 | 4 | 75.00% | [175] |
| :large_blue_circle: | com/starrocks/storagevolume/StorageVolume.java | 31 | 38 | 81.58% | [334, 335, 340, 341, 342, 348, 351] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/physical/PhysicalJoinOperator.java | 5 | 6 | 83.33% | [81] |
| :large_blue_circle: | com/starrocks/qe/scheduler/assignment/RemoteFragmentAssignmentStrategy.java | 21 | 24 | 87.50% | [105, 132, 143] |
| :large_blue_circle: | com/starrocks/catalog/mv/MVTimelinessArbiter.java | 24 | 27 | 88.89% | [110, 111, 120] |
| :large_blue_circle: | com/starrocks/sql/optimizer/CachingMvPlanContextBuilder.java | 15 | 17 | 88.24% | [103, 104] |
| :large_blue_circle: | com/starrocks/sql/common/PCellWithName.java | 35 | 39 | 89.74% | [93, 144, 153, 156] |
| :large_blue_circle: | com/starrocks/rpc/BackendServiceClient.java | 10 | 11 | 90.91% | [103] |
| :large_blue_circle: | com/starrocks/qe/SessionVariable.java | 19 | 20 | 95.00% | [5660] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rewrite/scalar/ImplicitCastRule.java | 22 | 23 | 95.65% | [122] |
| :large_blue_circle: | com/starrocks/system/BackendResourceStat.java | 146 | 151 | 96.69% | [101, 102, 103, 104, 105] |
| :large_blue_circle: | com/starrocks/sql/analyzer/ExpressionAnalyzer.java | 42 | 43 | 97.67% | [315] |
| :large_blue_circle: | com/starrocks/sql/plan/PlanFragmentBuilder.java | 88 | 90 | 97.78% | [2549, 4455] |
| :large_blue_circle: | com/starrocks/scheduler/mv/pct/MVPCTRefreshListPartitioner.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/lake/TabletRepairHelper.java | 58 | 58 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/scalar/ScalarOperator.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/QueryOptimizer.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/dag/FragmentInstanceExecState.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/connector/RemoteFileOperations.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/IcebergTable.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/task/ReplicateSnapshotTask.java | 30 | 30 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/slot/QueryQueueOptions.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/DeletePlanner.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/OptExpression.java | 6 | 6 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/common/Config.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/tree/JsonPathRewriteRule.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/connector/partitiontraits/HivePartitionTraits.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/transformation/ListPartitionPruner.java | 15 | 15 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/common/RangePartitionDiffer.java | 20 | 20 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/MvRefreshArbiter.java | 16 | 16 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/StmtExecutor.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/UpdatePlanner.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rewrite/scalar/FoldConstantsRule.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/ResourceGroupMgr.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/mv/MVTimelinessListPartitionArbiter.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/dag/JobSpec.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/slot/PipelineDriverAllocator.java | 18 | 18 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/dump/QueryDumpSerializer.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/Column.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/DefaultCoordinator.java | 9 | 9 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/LogicalPlanPrinter.java | 15 | 15 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/planner/PlanFragment.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/planner/HiveTableSink.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/tree/AddIndexOnlyPredicateRule.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/OperatorType.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/mv/MVTimelinessMgr.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/common/PartitionDiffer.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/InsertPlanner.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/tree/exprreuse/ScalarOperatorsReuse.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/dump/QueryDumpDeserializer.java | 22 | 22 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/mv/MVTimelinessNonPartitionArbiter.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/mv/pct/MVPCTRefreshNonPartitioner.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/GlobalVariable.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/QueryMaterializationContext.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/mv/pct/MVPCTRefreshPartitioner.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/transformation/materialization/rule/TextMatchBasedRewriteRule.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/analyzer/ResourceGroupAnalyzer.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/HiveTable.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/planner/expression/ExprToThrift.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/OptimizerContext.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/cost/HashJoinCostModel.java | 8 | 8 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/ScanOperatorPredicates.java | 7 | 7 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/dag/ExecutionDAG.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/lake/vacuum/FullVacuumDaemon.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rewrite/OptExternalPartitionPruner.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/rpc/PExecBatchPlanFragmentsRequest.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/system/ComputeNode.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/statistic/StatisticUtils.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/LoadPlanner.java | 7 | 7 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/connector/hive/PartitionUpdate.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/dump/QueryDumpInfo.java | 22 | 22 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/dag/SingleNodeSchedule.java | 10 | 10 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/persist/gson/GsonUtils.java | 5 | 5 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/operator/physical/PhysicalTopNOperator.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/rule/transformation/MaterializedViewTransparentRewriteRule.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/catalog/mv/MVTimelinessRangePartitionArbiter.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/qe/scheduler/assignment/FragmentAssignmentStrategyFactory.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/analyzer/PolymorphicFunctionAnalyzer.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/OptExpressionVisitor.java | 2 | 2 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/TaskRun.java | 3 | 3 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/scheduler/mv/pct/MVPCTRefreshRangePartitioner.java | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/common/ListPartitionDiffer.java | 4 | 4 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/sql/optimizer/cost/feature/FeatureExtractor.java | 7 | 7 | 100.00% | [] |
| :large_blue_circle: | com/starrocks/lake/StarOSAgent.java | 40 | 40 | 100.00% | [] |