iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Support Adding Parquet Files to an Existing Table

Open jacksonrnewhouse opened this issue 1 year ago • 3 comments

Arroyo is a Rust-based stream processing engine that performs reliable computation on data across many supported sources and writes to a similar number of sinks. It has support for writing vanilla parquet to S3, as well as a Delta Lake integration. We'd like to also be able to write to Iceberg tables. Because of the consistency mechanisms of Arroyo, writes will be done separately from adding the files to the table, so we only need something like an "insert_table()" method on an existing table. It'd also be helpful to have some sort of "create table if not exist", but if that's more work we can tell users they have to make the table themselves.

jacksonrnewhouse avatar Mar 01 '24 02:03 jacksonrnewhouse

Thank you for bringing this to our attention. This feature is indeed included in our process of writing data into Iceberg. We simply need to make the API accessible.

Xuanwo avatar Mar 01 '24 02:03 Xuanwo

Hi, @jacksonrnewhouse What's mention are two feature:

  1. Create table.
  2. Append files.

These two features are transaction apis. 1 is relative easy to finish, while 2 is a little complicated. For 2, do you need to insert also deletes, or just append data?

liurenjie1024 avatar Mar 01 '24 02:03 liurenjie1024

Just appending data would be sufficient.

jacksonrnewhouse avatar Mar 01 '24 03:03 jacksonrnewhouse