iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

feat: support append data file and add e2e test

Open ZENOTME opened this issue 1 year ago • 3 comments

This PR is complete https://github.com/apache/iceberg-rust/issues/345.

  1. It adds the FastAppendAction to commit the data file

The design of this is based on https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotProducer.java.

I implement a SnapshotProduceAction which will accept a Vec<ManifestFile> and Summary to generate a new snapshot and apply the snapshot to the tx.

FastAppendAction will reuse SnapshotProduceAction and have its own interface to process the added data files.

In the future, we can reuse SnapshotProduceAction to implement more append actions with different commit semantics as described in https://github.com/apache/iceberg-rust/issues/348.

  1. It init the e2e test for write data file

Please let me know if this design has something that can be improved and other things missed.

ZENOTME avatar Apr 24 '24 17:04 ZENOTME

cc @liurenjie1024 @Fokko @Xuanwo

ZENOTME avatar Apr 25 '24 03:04 ZENOTME

Hi, I have tried to fix this PR. Some things may not be fixed well now:

  1. https://github.com/apache/iceberg-rust/pull/349#discussion_r1580444775 I'm not sure whether my understanding is correct
  2. todo, we can do them in later PR:
  • https://github.com/apache/iceberg-rust/pull/349#discussion_r1580446821
  • https://github.com/apache/iceberg-rust/pull/349#discussion_r1579420662
  1. https://github.com/apache/iceberg-rust/pull/349#discussion_r1580571634 Please let me know if there are other things I miss and need to fix. cc @Fokko

ZENOTME avatar May 13 '24 16:05 ZENOTME

We're introducing a lot of new concepts here and generate a lot of open end (snapshot summary generation, metrics collection, schema compatibility checks, etc). I think it would be best to break this PR into smaller pieces. For example, I'm not sure if the way we create the fast-append is very extensible and I think we can copy a lot from PyIceberg where we track the changes of the metadata.

Sorry for being late. Recently I'm back to active for this PR now. I have separated some code from this PR; for now, this mainly contains the support for the append action of the transaction. I have refactored the design of fast append to the following design of PyIceberg so that is more extensible. Feel free to make any suggestions that things need to change. cc @Fokko @liurenjie1024 @Xuanwo

ZENOTME avatar Sep 26 '24 11:09 ZENOTME