starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Optimize deltaRows with lazy evaluation for large partition tables (backport #66381)

Open mergify[bot] opened this issue 2 weeks ago โ€ข 2 comments

Why I'm doing:

When a table has a large number of partitions (e.g., 36,000), the deltaRows method is called during statistics calculation for every DeriveStatsTask. Previously, deltaRows was invoked unconditionally in getPartitionRows, even when no partitions actually needed delta compensation. This caused severe performance issues:

  1. Unnecessary computation: deltaRows iterates through all partitions and queries statistics, even when deltaRows result is never used
  2. Expensive operations: Each deltaRows call involves:
    • Querying statistics for all 36,000 partitions
    • Iterating through all partitions
    • Multiple HashMap operations

The root cause is that deltaRows was computed eagerly at the beginning of getPartitionRows, but only used conditionally when needDelta is true. image

What I'm doing:

Implement lazy evaluation of deltaRows in getPartitionRows:

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [x] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [x] 3.5
    • [x] 3.4
    • [x] 3.3

[!NOTE] Lazily computes deltaRows in getPartitionRows and only when needed, reducing overhead on large partition tables.

  • Optimizer/Statistics (StatisticsCalcUtils#getPartitionRows):
    • Implement lazy evaluation of deltaRows (compute once, only if any partition needs delta compensation).
    • Introduce statsUpdateTime and needDelta to streamline conditions for applying deltaRows.
    • Preserve existing behavior while avoiding unnecessary full-partition scans when stats are up-to-date.

Written by Cursor Bugbot for commit b05ff64772558f0f3be1d4ae0cfb0eeb16d26898. This will update automatically on new commits. Configure here.


This is an automatic backport of pull request #66381 done by [Mergify](https://mergify.com).

[!NOTE] Computes deltaRows only when any partition needs delta compensation, reducing unnecessary full-partition scans during stats calculation.

  • Optimizer/Statistics (StatisticsCalcUtils#getPartitionRows):
    • Implement lazy evaluation for deltaRows (compute once, only if needed per-partition).
    • Introduce statsUpdateTime and needDelta to simplify delta application conditions.
    • Preserve row count logic while avoiding costly iteration over all partitions when stats are current.

Written by Cursor Bugbot for commit e9180752fa9e889982c32be041a41205f090820c. This will update automatically on new commits. Configure here.

mergify[bot] avatar Dec 11 '25 09:12 mergify[bot]

๐Ÿงช CI Insights

Here's what we observed from your CI run for e9180752.

โœ… Passed Jobs With Interesting Signals

Pipeline Job Signal Health on branch-3.5 Retries ๐Ÿ” CI Insights ๐Ÿ“„ Logs
CI PIPELINE - BRANCH FE UT Base branch is broken, but retries were needed. Could be early signs of flakiness ๐Ÿ‘€ Broken 1 View View
RUN CHECKER Base branch is healthy, but retries were needed. Could be early signs of flakiness ๐Ÿ‘€ Healthy 1 View View
PR CHECKER automerge-check Base branch is broken, but the job passed. Looks like this might be a real fix ๐Ÿ’ช Broken 0 View View

mergify[bot] avatar Dec 11 '25 09:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata