[Improvement]: Equality field id are different in a RewriteFilesInput
Search before asking
- [X] I have searched in the issues and found no similar issues.
What would you like to be improved?
tm_id=optimizer-kubed-bts-0-fo26ux-taskmanager-1-2
application_id=/default
java.lang.IllegalArgumentException: Equality delete files have different delete fields
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
at com.netease.arctic.io.reader.CombinedDeleteFilter.<init>(CombinedDeleteFilter.java:130)
at com.netease.arctic.io.reader.GenericCombinedIcebergDataReader$GenericDeleteFilter.<init>(GenericCombinedIcebergDataReader.java:305)
at com.netease.arctic.io.reader.GenericCombinedIcebergDataReader.<init>(GenericCombinedIcebergDataReader.java:97)
at com.netease.arctic.optimizing.IcebergRewriteExecutor.dataReader(IcebergRewriteExecutor.java:68)
at com.netease.arctic.optimizing.AbstractRewriteFilesExecutor.<init>(AbstractRewriteFilesExecutor.java:84)
at com.netease.arctic.optimizing.IcebergRewriteExecutor.<init>(IcebergRewriteExecutor.java:46)
at com.netease.arctic.optimizing.IcebergRewriteExecutorFactory.createExecutor(IcebergRewriteExecutorFactory.java:38)
at com.netease.arctic.optimizing.IcebergRewriteExecutorFactory.createExecutor(IcebergRewriteExecutorFactory.java:25)
at com.netease.arctic.optimizer.common.OptimizerExecutor.executeTask(OptimizerExecutor.java:148)
at com.netease.arctic.optimizer.flink.FlinkOptimizerExecutor.executeTask(FlinkOptimizerExecutor.java:70)
at com.netease.arctic.optimizer.common.OptimizerExecutor.start(OptimizerExecutor.java:52)
at com.netease.arctic.optimizer.flink.FlinkExecutor.lambda$open$0(FlinkExecutor.java:59)
at java.lang.Thread.run(Thread.java:750)
The process that have different equality field ids cannot be executed
How should we improve?
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Subtasks
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
#2912 has enhanced the exception message by adding the file path of different file ids.
Depending on the actual production scenario, idenetifier fields may change, resulting in constant failure of the optimisation task. When filtering eq-delete records, can we do splitting based on different eq-delete ids in the optimizing task? In Iceberg, it is possible to add different eq-delete predicates for all records with different eq-delete field ids. cc @zhoujinsong @zhongqishang
Depending on the actual production scenario, idenetifier fields may change, resulting in constant failure of the optimisation task.
Yes, this is a common scenario, I think we need to support this feature.
When filtering eq-delete records, can we do splitting based on different eq-delete ids in the optimizing task? In Iceberg, it is possible to add different eq-delete predicates for all records with different eq-delete field ids. cc @zhoujinsong @zhongqishang
I think as you said, we can group by eq-delete ids and generate multiple predicates to filter the data.