starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Enhancement] Optimize deltaRows with lazy evaluation for large partition tables (backport #66381)

Open mergify[bot] opened this issue 2 weeks ago • 2 comments

Why I'm doing:

When a table has a large number of partitions (e.g., 36,000), the deltaRows method is called during statistics calculation for every DeriveStatsTask. Previously, deltaRows was invoked unconditionally in getPartitionRows, even when no partitions actually needed delta compensation. This caused severe performance issues:

  1. Unnecessary computation: deltaRows iterates through all partitions and queries statistics, even when deltaRows result is never used
  2. Expensive operations: Each deltaRows call involves:
    • Querying statistics for all 36,000 partitions
    • Iterating through all partitions
    • Multiple HashMap operations

The root cause is that deltaRows was computed eagerly at the beginning of getPartitionRows, but only used conditionally when needDelta is true. image

What I'm doing:

Implement lazy evaluation of deltaRows in getPartitionRows:

Fixes #issue

What type of PR is this:

  • [ ] BugFix
  • [ ] Feature
  • [x] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Does this PR entail a change in behavior?

  • [ ] Yes, this PR will result in a change in behavior.
  • [x] No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • [ ] Parameter changes: default values, similar parameters but with different default values
  • [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
  • [ ] Feature removed
  • [ ] Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • [ ] I have added test cases for my bug fix or my new feature
  • [ ] This pr needs user documentation (for new or modified features or behaviors)
    • [ ] I have added documentation for my new feature or new function
  • [x] This is a backport pr

Bugfix cherry-pick branch check:

  • [x] I have checked the version labels which the pr will be auto-backported to the target branch
    • [x] 4.0
    • [x] 3.5
    • [x] 3.4
    • [x] 3.3

[!NOTE] Lazily computes deltaRows in getPartitionRows and only when needed, reducing overhead on large partition tables.

  • Optimizer/Statistics (StatisticsCalcUtils#getPartitionRows):
    • Implement lazy evaluation of deltaRows (compute once, only if any partition needs delta compensation).
    • Introduce statsUpdateTime and needDelta to streamline conditions for applying deltaRows.
    • Preserve existing behavior while avoiding unnecessary full-partition scans when stats are up-to-date.

Written by Cursor Bugbot for commit b05ff64772558f0f3be1d4ae0cfb0eeb16d26898. This will update automatically on new commits. Configure here.


This is an automatic backport of pull request #66381 done by [Mergify](https://mergify.com).

[!NOTE] Lazily computes deltaRows only when at least one partition needs it, reducing overhead for large-partition tables while preserving behavior.

  • Optimizer/Statistics:
    • StatisticsCalcUtils#getPartitionRows:
      • Implement lazy evaluation of deltaRows (compute once, only if any partition needs compensation).
      • Introduce statsUpdateTime and needDelta to streamline conditions using lastWorkTimestamp/updateTime.
      • Avoids unnecessary full-partition scans when stats are up-to-date.

Written by Cursor Bugbot for commit 70868b27f2a26cfda95863ba8c8e3176db11cd3c. This will update automatically on new commits. Configure here.

mergify[bot] avatar Dec 11 '25 09:12 mergify[bot]

🧪 CI Insights

Here's what we observed from your CI run for 70868b27.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify[bot] avatar Dec 11 '25 10:12 mergify[bot]

@cursor review

alvin-celerdata avatar Dec 11 '25 15:12 alvin-celerdata