starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Add label-formatted statistics support to QueryDumpDeserializer

Open Copilot opened this issue 2 weeks ago โ€ข 8 comments

Why I'm doing:

PR #65899 introduced label-formatted statistics output ([MIN: 1.0, MAX: 100.0, ...]) controlled by enable_labeled_column_statistic_output, but QueryDumpDeserializer only parsed the old format ([1.0, 100.0, ...]), breaking deserialization of dumps generated with the new format.

What I'm doing:

Enhanced ColumnStatistic.buildFrom(String) to auto-detect and parse both formats:

Old format:

"[1.0, 100.0, 0.0, 8.0, 50.0] ESTIMATE"
// [minValue, maxValue, nullsFraction, averageRowSize, distinctValuesCount]

New labeled format:

"[MIN: 1.0, MAX: 100.0, NULLS: 0.0, ROS: 8.0, NDV: 50.0] ESTIMATE"

Key changes:

  • Auto-detects format by checking for MIN: label
  • Added parseLabeledValue() helper for extracting labeled values
  • Preserves all existing validation (range checks, NDV normalization)
  • Refactored tests to use parameterized tests with JUnit 5's @ParameterizedTest and @MethodSource, reducing test code from 307 to 247 lines (60 lines saved, ~20% reduction) while maintaining comprehensive coverage
  • Added comprehensive unit tests covering both formats, edge cases, and end-to-end deserialization
  • Added integration test in QueryDumperTest to verify labeled statistics work when session variable is enabled

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [ ] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [ ] 4.0
    • [ ] 3.5
    • [ ] 3.4
    • [ ] 3.3
Original prompt

This section details on the original issue you should resolve

<issue_title>Enahnce QueryDumpDeserializer to support label-formatted statistics</issue_title> <issue_description>## Enhancement

#https://github.com/StarRocks/starrocks/pull/65899 introduce a new statistics format, but current QueryDumpDeserializer doesn't support it yet, which will break the compatibility.

So the QueryDumpDeserializer also needs to be upgraded to support it.</issue_description>

<agent_instructions>1. implement the code 2. add ut</agent_instructions>

Comments on the Issue (you are @copilot in this section)

  • Fixes StarRocks/starrocks#66655

๐Ÿ’ก You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.


[!NOTE] Add auto-detection/parsing of labeled column statistics in ColumnStatistic.buildFrom with extensive tests and a QueryDumper labeled-format test.

  • Statistics/Parsing:
    • Update ColumnStatistic.buildFrom(String) to auto-detect and parse labeled format (MIN:/MAX:/NULLS:/ROS:/NDV:) in addition to the old numeric format; retain range/NDV validations.
    • Add parseLabeledValue(...) helper for labeled extraction.
  • Tests:
    • Add QueryDumpDeserializerTest with parameterized cases covering old/new formats, UNKNOWN/no type, invalid ranges, negative/zero NDV, scientific notation, and mixed-format deserialization.
    • Extend QueryDumperTest to verify labeled statistics output when the session variable is enabled.

Written by Cursor Bugbot for commit ff4d0ede253a3acaa80ba0405428591e9ae7c5c5. This will update automatically on new commits. Configure here.

Copilot avatar Dec 12 '25 02:12 Copilot

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: murphyatwork
:x: Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Dec 12 '25 02:12 CLAassistant

๐Ÿงช CI Insights

Here's what we observed from your CI run for ff4d0ede.

๐ŸŸข All jobs passed!

But CI Insights is watching ๐Ÿ‘€

mergify[bot] avatar Dec 12 '25 03:12 mergify[bot]

@copilot refactor the test to parameterized test to reduce redundant code

murphyatwork avatar Dec 12 '25 03:12 murphyatwork

@copilot refactor the test to parameterized test to reduce redundant code

Refactored the tests to use @ParameterizedTest with @MethodSource. This reduced redundant code by 60 lines (from 307 to 247 lines) while maintaining the same test coverage. The refactoring groups similar test cases together:

  • testParseBasicStatistics - tests both old and new formats with standard values
  • testParseUnknownTypeStatistics - tests UNKNOWN type handling with 4 variants
  • testParseStatisticsWithNullFraction - tests null fraction handling
  • testParseStatisticsWithInvalidRange - tests invalid range handling
  • testParseStatisticsWithNegativeDistinctValues - tests negative NDV handling
  • testParseStatisticsWithScientificNotation - tests scientific notation
  • testDeserializeQueryDumpWithStatistics - tests end-to-end deserialization

Commit: 0aaf3d9e

Copilot avatar Dec 12 '25 03:12 Copilot

@cursor review

alvin-celerdata avatar Dec 12 '25 04:12 alvin-celerdata

@cursor review

alvin-celerdata avatar Dec 15 '25 17:12 alvin-celerdata

@Mergifyio rebase

murphyatwork avatar Dec 16 '25 02:12 murphyatwork

rebase

โŒ Base branch update has failed

Git reported the following error:

Rebasing (1/5)
Rebasing (2/5)
Rebasing (3/5)
Auto-merging fe/fe-core/src/test/java/com/starrocks/sql/optimizer/QueryDumperTest.java
CONFLICT (content): Merge conflict in fe/fe-core/src/test/java/com/starrocks/sql/optimizer/QueryDumperTest.java
error: could not apply befcaca6dc... Add comprehensive unit and integration tests for label-formatted statistics
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply befcaca6dc... Add comprehensive unit and integration tests for label-formatted statistics

mergify[bot] avatar Dec 16 '25 02:12 mergify[bot]

@copilot rebase latest main branch

murphyatwork avatar Dec 16 '25 03:12 murphyatwork

@copilot rebase latest main branch

I've rebased the branch on the latest main (commit 7369a80e). The branch now has 4 commits cleanly applied on top of the current main:

  1. 1148cafd - Enhance QueryDumpDeserializer to support label-formatted statistics
  2. a19b4b43 - Add comprehensive unit and integration tests for label-formatted statistics
  3. 473319d9 - Refactor tests to use parameterized tests to reduce redundant code
  4. 63229ada - fix fmt

The rebase resolved one merge conflict in QueryDumperTest.java where I kept both the new testDumpBeCoreStat() test from main and my testDumpQueryWithLabeledStatistics() test.

Note: The branch is ready to be force-pushed. Since I don't have direct git push permissions, you'll need to manually force-push this branch or configure the repository to allow me to force-push.

Copilot avatar Dec 16 '25 03:12 Copilot

@cursor review

alvin-celerdata avatar Dec 16 '25 05:12 alvin-celerdata

@mergifyio rebase

murphyatwork avatar Dec 16 '25 06:12 murphyatwork

rebase

โœ… Branch has been successfully rebased

mergify[bot] avatar Dec 16 '25 06:12 mergify[bot]

[FE Incremental Coverage Report]

:x: fail : 1195 / 1494 (79.99%)

file detail

path covered_line new_line coverage not_covered_line_detail
:large_blue_circle: com/starrocks/sql/optimizer/operator/OperatorVisitor.java 0 2 00.00% [447, 451]
:large_blue_circle: com/starrocks/sql/common/DebugOperatorTracer.java 0 24 00.00% [575, 576, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603]
:large_blue_circle: com/starrocks/meta/SqlBlackList.java 0 1 00.00% [67]
:large_blue_circle: com/starrocks/StarRocksFEServer.java 0 1 00.00% [126]
:large_blue_circle: com/starrocks/rpc/LakeServiceWithMetrics.java 0 2 00.00% [193, 194]
:large_blue_circle: com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java 2 7 28.57% [87, 88, 89, 90, 91]
:large_blue_circle: com/starrocks/planner/NoopSink.java 2 6 33.33% [30, 35, 40, 45]
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalFetchOperator.java 15 27 55.56% [54, 58, 78, 88, 89, 91, 92, 94, 95, 96, 97, 98]
:large_blue_circle: com/starrocks/planner/LookUpNode.java 9 16 56.25% [55, 56, 57, 58, 59, 60, 61]
:large_blue_circle: com/starrocks/connector/hive/HiveUtils.java 48 84 57.14% [42, 59, 60, 62, 86, 87, 88, 89, 90, 111, 122, 124, 126, 127, 132, 133, 134, 137, 139, 140, 145, 146, 149, 150, 151, 161, 162, 163, 165, 166, 167, 168, 187, 188, 189, 190]
:large_blue_circle: com/starrocks/planner/FetchNode.java 30 51 58.82% [59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 72, 88, 89, 90, 91, 92, 93, 106, 107, 116, 121]
:large_blue_circle: com/starrocks/system/SystemInfoService.java 5 8 62.50% [1540, 1541, 1542]
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalLookUpOperator.java 18 29 62.07% [91, 101, 102, 104, 105, 107, 108, 109, 110, 111, 112]
:large_blue_circle: com/starrocks/connector/RemoteFileOperations.java 2 3 66.67% [320]
:large_blue_circle: com/starrocks/sql/optimizer/LateMaterializationRewriter.java 356 503 70.78% [162, 241, 244, 247, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 371, 390, 391, 401, 424, 425, 426, 427, 428, 430, 440, 441, 474, 475, 502, 503, 505, 506, 507, 508, 510, 511, 543, 544, 545, 547, 550, 552, 553, 554, 555, 556, 558, 559, 564, 565, 566, 568, 570, 571, 572, 573, 574, 576, 578, 583, 584, 585, 586, 591, 592, 593, 594, 596, 597, 598, 600, 601, 606, 607, 608, 610, 612, 614, 615, 616, 617, 618, 620, 621, 622, 627, 628, 629, 630, 631, 632, 633, 635, 636, 687, 688, 689, 690, 691, 692, 694, 696, 699, 700, 702, 704, 706, 707, 708, 709, 711, 713, 715, 716, 717, 718, 719, 720, 721, 723, 814, 815, 816, 817, 818, 819, 821, 822, 867, 868, 882]
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/PruneHDFSScanColumnRule.java 3 4 75.00% [175]
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalJoinOperator.java 5 6 83.33% [81]
:large_blue_circle: com/starrocks/connector/partitiontraits/HivePartitionTraits.java 7 8 87.50% [73]
:large_blue_circle: com/starrocks/sql/optimizer/OptimizerTraceUtil.java 19 21 90.48% [115, 129]
:large_blue_circle: com/starrocks/qe/VariableMgr.java 11 12 91.67% [243]
:large_blue_circle: com/starrocks/qe/StmtExecutor.java 46 50 92.00% [1038, 1051, 3214, 3215]
:large_blue_circle: com/starrocks/sql/plan/PlanFragmentBuilder.java 86 90 95.56% [2549, 4145, 4146, 4455]
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/scalar/ImplicitCastRule.java 22 23 95.65% [122]
:large_blue_circle: com/starrocks/system/BackendResourceStat.java 145 151 96.03% [101, 102, 103, 104, 105, 358]
:large_blue_circle: com/starrocks/sql/analyzer/ExpressionAnalyzer.java 42 43 97.67% [315]
:large_blue_circle: com/starrocks/catalog/Table.java 1 1 100.00% []
:large_blue_circle: com/starrocks/lake/TabletRepairHelper.java 58 58 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/MvRewritePreprocessor.java 1 1 100.00% []
:large_blue_circle: com/starrocks/qe/SessionVariable.java 9 9 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/scalar/ScalarOperator.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/QueryOptimizer.java 3 3 100.00% []
:large_blue_circle: com/starrocks/catalog/IcebergTable.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/slot/QueryQueueOptions.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/DeletePlanner.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/OptExpression.java 6 6 100.00% []
:large_blue_circle: com/starrocks/common/Config.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/JsonPathRewriteRule.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/transformation/ListPartitionPruner.java 15 15 100.00% []
:large_blue_circle: com/starrocks/sql/UpdatePlanner.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/scalar/FoldConstantsRule.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/ResourceGroupMgr.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/slot/PipelineDriverAllocator.java 18 18 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpSerializer.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/Column.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/LogicalPlanPrinter.java 15 15 100.00% []
:large_blue_circle: com/starrocks/planner/PlanFragment.java 5 5 100.00% []
:large_blue_circle: com/starrocks/planner/HiveTableSink.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/OperatorType.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/InsertPlanner.java 5 5 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/exprreuse/ScalarOperatorsReuse.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpDeserializer.java 22 22 100.00% []
:large_blue_circle: com/starrocks/qe/GlobalVariable.java 2 2 100.00% []
:large_blue_circle: com/starrocks/scheduler/mv/MVRefreshPartitionSelector.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/ResourceGroupAnalyzer.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/HiveTable.java 8 8 100.00% []
:large_blue_circle: com/starrocks/planner/expression/ExprToThrift.java 2 2 100.00% []
:large_blue_circle: com/starrocks/catalog/FunctionSet.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/cost/HashJoinCostModel.java 8 8 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/ScanOperatorPredicates.java 7 7 100.00% []
:large_blue_circle: com/starrocks/lake/vacuum/FullVacuumDaemon.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rewrite/OptExternalPartitionPruner.java 1 1 100.00% []
:large_blue_circle: com/starrocks/statistic/StatisticExecutor.java 6 6 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/rule/tree/lowcardinality/DecodeCollector.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/statistics/ColumnStatistic.java 20 20 100.00% []
:large_blue_circle: com/starrocks/system/ComputeNode.java 4 4 100.00% []
:large_blue_circle: com/starrocks/sql/LoadPlanner.java 7 7 100.00% []
:large_blue_circle: com/starrocks/connector/hive/PartitionUpdate.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/dump/QueryDumpInfo.java 22 22 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/operator/physical/PhysicalTopNOperator.java 1 1 100.00% []
:large_blue_circle: com/starrocks/catalog/TableProperty.java 7 7 100.00% []
:large_blue_circle: com/starrocks/common/util/PropertyAnalyzer.java 2 2 100.00% []
:large_blue_circle: com/starrocks/qe/scheduler/assignment/FragmentAssignmentStrategyFactory.java 1 1 100.00% []
:large_blue_circle: com/starrocks/sql/analyzer/PolymorphicFunctionAnalyzer.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/OptExpressionVisitor.java 2 2 100.00% []
:large_blue_circle: com/starrocks/sql/optimizer/cost/feature/FeatureExtractor.java 7 7 100.00% []

github-actions[bot] avatar Dec 16 '25 08:12 github-actions[bot]

[Java-Extensions Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 16 '25 08:12 github-actions[bot]

[BE Incremental Coverage Report]

:white_check_mark: pass : 0 / 0 (0%)

github-actions[bot] avatar Dec 16 '25 08:12 github-actions[bot]

@cursor review

alvin-celerdata avatar Dec 17 '25 02:12 alvin-celerdata

@Mergifyio backport branch-4.0

github-actions[bot] avatar Dec 17 '25 07:12 github-actions[bot]

backport branch-4.0

โœ… Backports have been created

mergify[bot] avatar Dec 17 '25 07:12 mergify[bot]