trino
                                
                                 trino copied to clipboard
                                
                                    trino copied to clipboard
                            
                            
                            
                        OPTIMIZE does not clean up equality delete files after update for singleton data files skip (regression after PR #23864)
Issue
After upgrading to a version of Trino that includes PR #23864, we observed that OPTIMIZE no longer cleans up equality delete files in certain cases., even after new data and new deletes are added to partitions. This leads to equality delete files accumulating, which impacts performance and storage. Reverting the changes made in PR #23864 restores the expected behavior of the trino-iceberg plugin.
Symptoms
•	Equality delete files remain in the table after OPTIMIZE operations.
•	This persists even after new data is added and more deletes are performed in the affected partitions.
•	The Trino logs show that many files are being skipped during OPTIMIZE, for example:
INFO IcebergSplitSource Generated 2 splits, skipped 6 files for OPTIMIZE
•	Only some equality delete files are removed, leaving many behind:
INFO CommitReport ... removedEqualityDeletes= 93, totalEqualityDeletes=124
Steps to reproduce
Consistent manual reproduction has yet to be achieved. This behavior presents itself during day-to-day operations.
Expected behavior
• OPTIMIZE should rewrite all files that are referenced by equality delete files, removing reliance on equality deletes from prior snapshots. • After all affected data files have been rewritten, equality delete files should be safely removed/cleaned up.
Actual behavior
• OPTIMIZE is not handling all equality deletes when executed, leaving behind delete files that are never cleaned up. • Data files that are “clean” (only file in partition) are being skipped by OPTIMIZE, even if they are still referenced by equality delete files. • This results in equality delete files that are effectively “stuck” and never cleaned up, unless forced by a full-table rewrite. Equality deletes have reached over 2 billion.
Logs
INFO IcebergSplitSource Generated 2 splits, skipped 6 files for OPTIMIZE
INFO CommitReport ... removedEqualityDeleteFiles=93, totalEqualityDeletes=124
Environment
• Trino version: 464+ • Iceberg version: 1.6.1 • Catalog: REST • Table format: Iceberg V2
Additional context
This behavior appears to be a regression or unintended consequence of the logic introduced in PR #23864, which skips rewriting singleton files in partitions unless they have direct deletes.
Proposed solution
• OPTIMIZE should ensure that all equality delete files are released by rewriting the data files they reference. • The file selection logic should be updated to ensure that all files referenced by equality delete files are included in the rewrite.
Workarounds attempted
• Adding new data and deletes to affected partitions (did not resolve the issue). • Running OPTIMIZE multiple times (did not resolve the issue). • Increasing the file_size_threshold to ensure all data in each partition is not being split into multiple files due to the amount of data (did not resolve the issue). • Forcing a full rewrite by creating a new table and copying the data over (does resolve the issue). However, this is not practical as a regular workaround.