parquet-format
parquet-format copied to clipboard
Apache Parquet Format
**Reporter**: [Fokko Driesprong](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=fokko) / @Fokko **Assignee**: [Fokko Driesprong](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=fokko) / @Fokko **Note**: *This issue was originally created as [PARQUET-2481](https://issues.apache.org/jira/browse/PARQUET-2481). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*
In Current Parquet implementions, if BloomFilter doesn't set the ndv, most implementions will guess the 1M as the ndv. And use it for fpp. So, if fpp is 0.01, the...
Due to PARQUET-2078 RowGroup.file_offset is not reliable. This field is also wrongly calculated in the C++ oss parquet implementation PARQUET-2089 **Reporter**: [Gabor Szadovszky](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gszadovszky) / @gszadovszky **Assignee**: [Gidon Gershinsky](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=gershinsky) / @ggershinsky...
Parquet format is getting more and more features while the different implementations cannot keep the pace and left behind with some features implemented and some are not. In many cases...
**Reporter**: [Micah Kornfield](https://issues.apache.org/jira/secure/[email protected]) / @emkornfield **Assignee**: [Micah Kornfield](https://issues.apache.org/jira/secure/[email protected]) / @emkornfield **Note**: *This issue was originally created as [PARQUET-1933](https://issues.apache.org/jira/browse/PARQUET-1933). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*
The current merge script is Python 3 incompatible, copy over the merge_script from the Arrow project which is a development that initially started from merge_parquet.py **Reporter**: [Uwe Korn](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=uwe) / @xhochy...
In the Parquet format specification, under [the section for Plain encoding](https://github.com/apache/parquet-format/blob/master/Encodings.md#plain-plain--0), boolean is encoded using the deprecated bit-packed encoding. However, [the section for bit-packed encoding](https://github.com/apache/parquet-format/blob/master/Encodings.md#bit-packed-deprecated-bit_packed--4) specifies that it is only...
We recently figured out that the Makefile was broken and it would be best to check it during the travis tests. I have a fix locally that I'll rebase and...
Although considered as deprecated, they should be documented as the format is quite special. **Reporter**: [Uwe Korn](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=uwe) / @xhochy **Assignee**: [Uwe Korn](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=uwe) / @xhochy #### PRs and other links: -...
Apache Iceberg is adding geospatial support: https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI. It would be good if Apache Parquet can support geometry type natively.