can
can
Referencing Spark/Iceberg's SizeBasedFileRewritePlanner, the number of output files is determined intelligently: | Condition | Decision | | ----------- | ----------- | | Remainder > minFileSize (default 0.75 * targetSize) |...
Root Cause of the Problem In `IcebergRewriteExecutor.targetSize()`, when the total size of the input files is greater than or equal to `targetSize`, it returns `targetSize` (instead of `Long.MAX_VALUE`). This causes:...
> I have some questions — if the issue occurred with the segment files, why is the input file size less than 1 MB? > Also, if the segment doesn’t...
> [@wardlican](https://github.com/wardlican) I couldn't reproduce this scenario. I think the main issue is related to the data; the inaccurate calculation of the Writer's scroll size is causing this phenomenon. >...
> > [@wardlican](https://github.com/wardlican) I couldn't reproduce this scenario. I think the main issue is related to the data; the inaccurate calculation of the Writer's scroll size is causing this phenomenon....
This is a rather serious problem, as it can lead to endless table merging and a continuous expansion of metadata.
Please check the changes here.