iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Apache PyIceberg

Results 402 iceberg-python issues
Sort by recently updated
recently updated
newest added

### Feature Request / Improvement Currently, these two functions are performing similar tasks; the only difference is the output format. Let's unify the implementations so they don't diverge. For example,...

### Apache Iceberg version main (development) ### Please describe the bug 🐞 ### `to_arrow_batch_reader` bug The bug is in [project_batches](https://github.com/apache/iceberg-python/blob/d8b5c17cadbc99e53d08ade6109283ee73f0d83e/pyiceberg/io/pyarrow.py#L1457-L1475), specifically with the way yield interacts with the two for-loops....

### Apache Iceberg version main (development) ### Please describe the bug 🐞 PyIceberg assumes the same FS implementation is used for reading both metadata and data. However, I want to...

This resolves #998 , where duplicate files are added with add_files method, handles 2 cases: 1. Files list is not unique 2. One of the files added is already referenced...

### Question How would I go about using a field with mixed datatypes? Is that recommended/possible? I am a fan of tall-tidy data and am wondering how to properly go...

### Changes Proposed in this PR Support a HA HMS URI such as `uri: thrift://hms-1:9083,thrift://hms-2:9083` currently is not supported. This change supports HA HMS were each entry will be tried...

When running PyIceberg in Windows, `pyiceberg.io.pyarrow.PyArrowFileIO.parse_location()` incorrectly resolve Windows path, e.g. `C:\foobar` as a scheme of `C` and causes `Unrecognized filesystem type in URI` error later. If using `file://` scheme,...

### Feature Request / Improvement As of now, `limit` is checked only after an entire parquet file is read. https://github.com/apache/iceberg-python/blob/d8b5c17cadbc99e53d08ade6109283ee73f0d83e/pyiceberg/io/pyarrow.py#L1360-L1390 Optimization to pushdown limit to the parquet reading level For...

### Feature Request / Improvement Currently `add_files` doesn't have a check to prevent adding an object that's already referenced by the Iceberg Table. We should include these two checks to...

good first issue