feat: support append data file and add e2e test
This PR is complete https://github.com/apache/iceberg-rust/issues/345.
- It adds the FastAppendAction to commit the data file
The design of this is based on https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotProducer.java.
I implement a SnapshotProduceAction which will accept a Vec<ManifestFile> and Summary to generate a new snapshot and apply the snapshot to the tx.
FastAppendAction will reuse SnapshotProduceAction and have its own interface to process the added data files.
In the future, we can reuse SnapshotProduceAction to implement more append actions with different commit semantics as described in https://github.com/apache/iceberg-rust/issues/348.
- It init the e2e test for write data file
Please let me know if this design has something that can be improved and other things missed.
cc @liurenjie1024 @Fokko @Xuanwo
Hi, I have tried to fix this PR. Some things may not be fixed well now:
- https://github.com/apache/iceberg-rust/pull/349#discussion_r1580444775 I'm not sure whether my understanding is correct
- todo, we can do them in later PR:
- https://github.com/apache/iceberg-rust/pull/349#discussion_r1580446821
- https://github.com/apache/iceberg-rust/pull/349#discussion_r1579420662
- https://github.com/apache/iceberg-rust/pull/349#discussion_r1580571634 Please let me know if there are other things I miss and need to fix. cc @Fokko
We're introducing a lot of new concepts here and generate a lot of open end (snapshot summary generation, metrics collection, schema compatibility checks, etc). I think it would be best to break this PR into smaller pieces. For example, I'm not sure if the way we create the fast-append is very extensible and I think we can copy a lot from PyIceberg where we track the changes of the metadata.
Sorry for being late. Recently I'm back to active for this PR now. I have separated some code from this PR; for now, this mainly contains the support for the append action of the transaction. I have refactored the design of fast append to the following design of PyIceberg so that is more extensible. Feel free to make any suggestions that things need to change. cc @Fokko @liurenjie1024 @Xuanwo