iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Support RowDeltaAction

Open ZENOTME opened this issue 8 months ago • 9 comments

Is your feature request related to a problem or challenge?

As #798、#1081, there are requirements for append delete data files (position delete, equality delete). This action is used to support the append of this kind of file.

Describe the solution you'd like

The path to support:

  • [ ] add conflict detection
  • [ ] add retry logic #964
  • [ ] complete RowDeltaAction

Willingness to contribute

None

ZENOTME avatar Mar 18 '25 11:03 ZENOTME

cc @Fokko @liurenjie1024

ZENOTME avatar Mar 18 '25 11:03 ZENOTME

For metadata conflict detection, what is the exact design outline that you are looking to implement?

For the row level detection I can start the implementation the manifest filter manager and manifest merge manager to build towards the merging snapshot producer used in the RowDelta. This can probably be done after delete files are fully implemented

jonathanc-n avatar Mar 21 '25 03:03 jonathanc-n

For metadata conflict detection, what is the exact design outline that you are looking to implement?

conflict detection implementation based on the validation phase. I would like to introduce the validation phase at SnapshotProduce apply(). After introducing it, we can have some specific implementation of kinds of validation.

For the row level detection I can start the implementation the manifest filter manager and manifest merge manager to build towards the merging snapshot producer used in the RowDelta. This can probably be done after delete files are fully implemented

Thanks @jonathanc-n!

  • For manifest merge manager, I think it has been cover in #902.
  • For manifest filter manager, maybe we need it. One interesting finding is that pyiceberg don't have the manifest filter manager. In this abstraction, it filter the delete entry directly. I haven't dive into the manifest filter manager, is needed in the future? cc @Fokko @kevinjqliu

ZENOTME avatar Mar 21 '25 04:03 ZENOTME

I'll take a deeper look into the implementation tomorrow

jonathanc-n avatar Mar 21 '25 04:03 jonathanc-n

@ZENOTME Sorry for the delay, i was a little busy with family. In the example you showed in the python implementation, the filtering for OverwriteFiles and DeleteFiles seems to be inherited from the snapshot producer. We could create the MergingSnapshotManager or something similar and have it be called when an action that produces a new snapshot. I think we do still need some things before this happens.

jonathanc-n avatar Mar 24 '25 03:03 jonathanc-n

One thing I would like to implement before the filter manager are residual predicates mentioned by @Fokko here

jonathanc-n avatar Mar 24 '25 03:03 jonathanc-n

Hi @ZENOTME @jonathanc-n , was wondering if there are any active work for this issue? I'm planning to look into conflict detecting logic and would be happy to contribute/collaborate on this:)

CTTY avatar May 15 '25 23:05 CTTY

I've created https://github.com/apache/iceberg-rust/issues/1344 to add validate logic, which should be a prerequisite of this issue

CTTY avatar May 17 '25 06:05 CTTY

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Nov 14 '25 00:11 github-actions[bot]