iceberg-rust
iceberg-rust copied to clipboard
Support RowDeltaAction
Is your feature request related to a problem or challenge?
As #798、#1081, there are requirements for append delete data files (position delete, equality delete). This action is used to support the append of this kind of file.
Describe the solution you'd like
The path to support:
- [ ] add conflict detection
- [ ] add retry logic #964
- [ ] complete RowDeltaAction
Willingness to contribute
None
cc @Fokko @liurenjie1024
For metadata conflict detection, what is the exact design outline that you are looking to implement?
For the row level detection I can start the implementation the manifest filter manager and manifest merge manager to build towards the merging snapshot producer used in the RowDelta. This can probably be done after delete files are fully implemented
For metadata conflict detection, what is the exact design outline that you are looking to implement?
conflict detection implementation based on the validation phase. I would like to introduce the validation phase at SnapshotProduce apply(). After introducing it, we can have some specific implementation of kinds of validation.
For the row level detection I can start the implementation the manifest filter manager and manifest merge manager to build towards the merging snapshot producer used in the RowDelta. This can probably be done after delete files are fully implemented
Thanks @jonathanc-n!
- For manifest merge manager, I think it has been cover in #902.
- For manifest filter manager, maybe we need it. One interesting finding is that pyiceberg don't have the manifest filter manager. In this abstraction, it filter the delete entry directly. I haven't dive into the manifest filter manager, is needed in the future? cc @Fokko @kevinjqliu
I'll take a deeper look into the implementation tomorrow
@ZENOTME Sorry for the delay, i was a little busy with family. In the example you showed in the python implementation, the filtering for OverwriteFiles and DeleteFiles seems to be inherited from the snapshot producer. We could create the MergingSnapshotManager or something similar and have it be called when an action that produces a new snapshot. I think we do still need some things before this happens.
One thing I would like to implement before the filter manager are residual predicates mentioned by @Fokko here
Hi @ZENOTME @jonathanc-n , was wondering if there are any active work for this issue? I'm planning to look into conflict detecting logic and would be happy to contribute/collaborate on this:)
I've created https://github.com/apache/iceberg-rust/issues/1344 to add validate logic, which should be a prerequisite of this issue
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.