[Bug]: In full optimizing mode, partition with two large files is marked as mergeable in evaluate stage but skipped in plan stage
What happened?
When full optimizing mode is enabled, the following issue occurs:
- A partition contains two relatively large files (for example, two 80 MB files).
- During the evaluate stage, the optimizer determines that these files can be merged.
- However, during the plan stage, these files are skipped, and no optimization task is generated.
Affects Versions
master/0.8.0
What table formats are you seeing the problem on?
No response
What engines are you seeing the problem on?
No response
How to reproduce
No response
Relevant log output
Anything else
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Here is the logic to determine if a full optimizing is necessary:
public boolean isFullNecessary() {
if (!reachFullInterval()) {
return false;
}
return anyDeleteExist()
|| fragmentFileCount >= 2
|| undersizedSegmentFileCount >= 2
|| rewriteSegmentFileCount > 0
|| rewritePosSegmentFileCount > 0;
}
And the undersizedSegmentFileCount >= 2 seems not to be enough, should we consider changing it to enoughContent()?
cc @zhongqishang
If we change undersizedSegmentFileCount >= 2 to enoughContent(), it still fails to return true.
As shown in its code:
CommonPartitionEvaluator.enoughContent() {
return undersizedSegmentFileSize >= config.getTargetSize()
&& min1SegmentFileSize + min2SegmentFileSize <= config.getTargetSize();
}
This means that even if there are two 80MB files, they will not be considered “enoughContent” if their combined size exceeds the target size (128MB), because while the first condition is met, the second condition is not (the sum of two 80MB files, 160MB, is greater than the target size of 128MB).
Can we change "undersizedSegmentFileCount >= 2" to "undersizedSegmentFileSize >= config.getTargetSize() && min1SegmentFileSize <= config.getTargetSize() && min2SegmentFileSize <= config.getTargetSize()" @zhoujinsong @zhongqishang
And the undersizedSegmentFileCount >= 2 seems not to be enough, should we consider changing it to enoughContent()?
+1
Can we change "undersizedSegmentFileCount >= 2" to "undersizedSegmentFileSize >= config.getTargetSize() && min1SegmentFileSize <= config.getTargetSize() && min2SegmentFileSize <= config.getTargetSize()"
@zhangwl9 It is expected that no compaction will occur with two 80m bin-packs, so we should set isFullNecessary() = false.
In full optimizing mode, 80MB file will be in rewriteSegmentFileCount, rewriteSegmentFileCount > 2 will be trigger the full optimizing.
It was not triggered by undersizedSegmentFileCount > 2. @zhoujinsong
No actual compaction will occur after binpack. This is the expected behavior now.