iceberg-rust
iceberg-rust copied to clipboard
Add files to add existing Parquet files to a table
In #345, we support writing new data files and appending them to the table. But we haven't support appending existing data files which need to support reading existing data files and generating corresponding metadata DataFile.
I would like to try working on this.
I would like to try working on this.
Thanks @jonathanc-n! Feel free to send the PR for this.
@ZENOTME When appending existing data files, should the system load file metadata by reading the current snapshot’s manifest lists from an existing Iceberg table, or would you prefer to specify a file path from which the system scans and infers metadata? I'm looking to just perform a TableScan based the answer and have it just add the DataFiles with the add_data_file.
@ZENOTME When appending existing data files, should the system load file metadata by reading the current snapshot’s manifest lists from an existing Iceberg table, or would you prefer to specify a file path from which the system scans and infers metadata? I'm looking to just perform a
TableScanbased the answer and have it just add theDataFileswith theadd_data_file.
Hi @jonathanc-n, I think we can refer the implementation of pyiceberg: https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/init.py#L669C9-L669C18.
should the system load file metadata by reading the current snapshot’s manifest lists from an existing Iceberg table, or would you prefer to specify a file path from which the system scans and infers metadata?
I think the user will add file using transaction API so we can know which table it will be append and related metadata.
@liurenjie1024 @jonathanc-n should this be closed now that https://github.com/apache/iceberg-rust/pull/960 is in?
Don't believe so, there a bunch of follow up prs that should be done before this is closed
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'