iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Support delete data files in fast append action

Open mnpw opened this issue 8 months ago • 7 comments
trafficstars

Is your feature request related to a problem or challenge?

The transaction API exposes FastAppendAction for making commits to catalog.

Equality delete writer was added with https://github.com/apache/iceberg-rust/pull/703 to support writing equality delete data files. However, FastAppendAction does not support committing equality delete data files.

See DataContentType check – https://github.com/apache/iceberg-rust/blob/main/crates/iceberg/src/transaction.rs#L443-L447

Describe the solution you'd like

It would be great if we can enhance FastAppendAction to support committing equality delete data files as well. I am willing to work on this.

Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

mnpw avatar Mar 13 '25 10:03 mnpw

@mnpw take a look at #1017 before proceeding.

jonathanc-n avatar Mar 13 '25 23:03 jonathanc-n

Thanks @mnpw for raising this. But this should not be included in fast append action since deletion typically requires conflict detection.

liurenjie1024 avatar Mar 14 '25 01:03 liurenjie1024

We should close this issue as it's not following iceberg's design.

liurenjie1024 avatar Mar 14 '25 01:03 liurenjie1024

But this should not be included in fast append action since deletion typically requires conflict detection.

@liurenjie1024 What do you think about a transaction action for only delete files, perhaps RowDeleteAction just like FastAppendAction? RowDeleteAction can take DataContentType::EqualityDeletes files and commit them into a new snapshot. On conflict during snapshot creation this action can behave similarly to FastAppendAction.

My use-case is being able to write delete files and commit them into a new snapshot. Please let me know if there is any other better way for the same.

mnpw avatar Mar 17 '25 11:03 mnpw

On conflict during snapshot creation this action can behave similarly to FastAppendAction.

Sorry, I don't get this point. FastAppendAction will never conflict with other transactions. By conflict detection I mean ensuring snapshot isolation during concurrent write: https://iceberg.apache.org/docs/nightly/reliability/#concurrent-write-operations

My use-case is being able to write delete files and commit them into a new snapshot. Please let me know if there is any other better way for the same.

Do you mean to write delete files only? I can understand such case, but we still need to do conflict detection for concurrent writes.

liurenjie1024 avatar Mar 18 '25 09:03 liurenjie1024

I think the intent of #798 is similar to this issue. We end up needing to implement RowDeltaAction for this intent.

@liurenjie1024 What do you think about a transaction action for only delete files, perhaps RowDeleteAction just like FastAppendAction? RowDeleteAction can take DataContentType::EqualityDeletes files and commit them into a new snapshot. On conflict during snapshot creation this action can behave similarly to FastAppendAction.

For the action of only deleting files, there may be concurrent new data append(or overwrite behavior) between them. These deleted files will affect the new append data and cause undefined behavior. So looks like we can't avoid the conflict detection. We can open the issue to track this.

My use-case is being able to write delete files and commit them into a new snapshot. Please let me know if there is any other better way for the same.

Before we complete RowDeleteAction, personally I think maybe you can try to hack the fast append to append the deleted data file as #798.(If you just want to try some simple case) It only works in simple cases (e.g. no concurrency write) but doesn't mean it's right.

ZENOTME avatar Mar 18 '25 10:03 ZENOTME

Create an issue to track RowDeleteAction: #1104

ZENOTME avatar Mar 18 '25 11:03 ZENOTME

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Sep 15 '25 00:09 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Oct 04 '25 00:10 github-actions[bot]