Matt Corley comments

Results 38 comments of


                                            Matt Corley

PyIceberg Near-Term Roadmap

@kevinjqliu @Fokko Where would something like the Iceberg Spark `create_changelog_view` procedure fit in this roadmap? Is that something that might be tackled as part of the other procedures under table...

Expose PyIceberg table as PyArrow Dataset

@kevinjqliu alas it's not as simple for iceberg because of the need to do field id-based projection to handle schema evolution. Somewhat relatedly: from what I remember, and assuming nothing...

[feature request] Allow engines to time travel

Still, an api like `Table.as_of(snapshot_id/timestamp) -> Snapshot` would be useful, even if reading requires then passing the correct arguments to `Table.scan`. In general it should be easier for pyiceberg users...

[feature request] Allow engines to time travel

> More over, multiple different snapshots can also be committed between two consecutive metadata json files. In what situations would that occur? In my (possibly incorrect) mental model of how...

add support for DuckDB views as a valid data format

Sounds like the ask here is for similar functionality in duckdb as was implemented in polars scan_iceberg. This relates also the previously discussed PyArrow Dataset protocol -- not sure if...

Merge into / Upsert

To work well with some of the larger data usecases where folks are using PySpark today, I think this would need to play well with pyarrow streaming read/write functionality, so...

Support iceberg hadoop catalog in python library

This would really help us out, where we use Hadoop catalog for unit testing PySpark code, and are increasingly encountering cases where we want to test code that uses both...

Support iceberg hadoop catalog in python library

@Fokko We do a setup similar to this for integration tests, but the ability to write faster unit tests that depend only on a temp directory fixture in pytest has...

Support setting a snapshot property in same commit as spark.sql

I think there's still some confusion here, since there are two possible interpretations of "represent extending the API to allow same commit semantics like the java": - **Interpretation 1:** allow...

feat: support azure blob storage

I think which blob storage to use in Azure should be a choice for the folks deploying the warehouse and not something that needs to be decided by iceberg sdks...