amoro [AMORO-3775] Add support for metric-based refresh event trigger in TableRuntimeRefreshExecutor

Why are the changes needed?

Close #3775.

Brief change log

Add support for MSE based refresh event:

Support for calculating partition filesize mean square error based on the loaded metadata.
Filter partitions need to be optimized based on threshold and trigger pendingInput evaluation if necessary.

How was this patch tested?

[x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
[ ] Add screenshots for manual tests if appropriate
[x] Run test locally before making a pull request

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Sep 09 '25 17:09 Jzjsnow

Can we move forward with this feature now? @Jzjsnow @klion26

Oct 27 '25 07:10 xxubai

Can we move forward with this feature now? @Jzjsnow @klion26

Sure, I've updated the branch and added the new evaluation criteria discussed earlier (see Step 1 for details).

The current conditions for triggering pendingInput evaluation based on metrics are as follows: Step 1: If the condition delete file=0 && avg file size > target size * ratio is met, the evaluation is considered unnecessary and will be skipped. Step 2: Calculate detailed attributes for each partition in the table, including the sum of squared errors for file sizes. If this exceeds the file size tolerance threshold, the pendingInput requires evaluation.

Note that this update now supports MIX_ICEBERG tables, whereas previously only ICEBERG format was supported.

Please take a look when you are free. Looking forward to your feedback! @xxubai @zhoujinsong @klion26

Oct 31 '25 08:10 Jzjsnow

In the latest commit, we've revamped the logic of EventBasedTrigger with key adjustments and new configurations:

The EventBasedTrigger now includes two key parameters:

FallbackInterval: The minimum interval for executing the original tryEvaluatingPendingInput logic. It prevents false positives or missed triggers from metadata metric-driven evaluation. Defaults to -1 (disabled); enabled when set to >=0.
MseTolerance: The tolerance threshold for partition file size MSE (default: 0). Partitions with actual MSE below this threshold are considered unnecessary for optimization.

When enabled, the flow now:

Determine if tryEvaluatingPendingInput needs to run:
- Check if the FallbackInterval is met to trigger tryEvaluatingPendingInput directly.
- Skip evaluation for empty tables.
- Skip if the condition delete file count = 0 && avg file size > target size * ratio is satisfied (no need for pending input evaluation).
Execute tryEvaluatingPendingInput if necessary:
- Use the existing scan logic to retrieve partition file information.
- Judge if each partition requires pending status: if the MSE threshold is met, further determine the optimization type (minor/major/full).
- Update pendingInput related information.

Please take a look where you are free! @xxubai @klion26

Nov 20 '25 08:11 Jzjsnow

Codecov Report

:x: Patch coverage is 0% with 101 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 4.77%. Comparing base (cbdc517) to head (2fb2dc7). :warning: Report is 15 commits behind head on master.

Files with missing lines	Patch %	Lines
...izing/evaluation/MetadataBasedEvaluationEvent.java	0.00%	29 Missing :warning:
...moro/optimizing/plan/CommonPartitionEvaluator.java	0.00%	27 Missing :warning:
.../evaluation/MixedAndIcebergTableStatsProvider.java	0.00%	17 Missing :warning:
...java/org/apache/amoro/config/OptimizingConfig.java	0.00%	15 Missing :warning:
...moro/optimizing/evaluation/TableStatsProvider.java	0.00%	9 Missing :warning:
...o/optimizing/plan/AbstractOptimizingEvaluator.java	0.00%	2 Missing :warning:
...e/amoro/optimizing/plan/AbstractPartitionPlan.java	0.00%	2 Missing :warning:

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #3776       +/-   ##
============================================
- Coverage     22.12%   4.77%   -17.36%     
+ Complexity     2461     471     -1990     
============================================
  Files           445     449        +4     
  Lines         40897   41048      +151     
  Branches       5767    5784       +17     
============================================
- Hits           9050    1958     -7092     
- Misses        31089   38896     +7807     
+ Partials        758     194      -564

Flag	Coverage Δ
trino	`4.77% <0.00%> (-17.36%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
:package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Nov 21 '25 08:11 codecov-commenter

LGTM. Also need to fix the unit tests

Dec 16 '25 11:12 xxubai