iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

[feat] optimize read, pushdown `limit` to file level for `to_arrow`

Open kevinjqliu opened this issue 1 year ago • 0 comments

Feature Request / Improvement

As of now, limit is checked only after an entire parquet file is read. https://github.com/apache/iceberg-python/blob/d8b5c17cadbc99e53d08ade6109283ee73f0d83e/pyiceberg/io/pyarrow.py#L1360-L1390

Optimization to pushdown limit to the parquet reading level

For more details, see this comment

kevinjqliu avatar Aug 11 '24 16:08 kevinjqliu