parquet-format icon indicating copy to clipboard operation
parquet-format copied to clipboard

Apache Parquet Format

Results 93 parquet-format issues
Sort by recently updated
recently updated
newest added

Apart from the original authors' great work, Apache projects should not mention any contributors in particular in the POM. ### Jira - [ ] My PR addresses the following [Parquet...

There is currently  no MIME type registered for Parquet.  Perhaps this is intentional. If it is not intentional, I suggest steps be taken to register a MIME type with IANA.    ...

Priority: Major
Type: enhancement

Currently, the specification of `ColumnIndex` in `parquet.thrift` is inconsistent, leading to cases where it is impossible to create a parquet file that is conforming to the spec. The problem is...

Priority: Major
Type: bug

The spec for DICTIONARY_ENCODING states that: > If the dictionary grows too big, whether in size or number of distinct values, the encoding will fall back to the plain encoding....

Priority: Critical
Type: bug

The parquet format specification doesn't say whether a Parquet file having columns with the same name (in the same group node, so really exactly the same name) is valid. I.e.,...

Priority: Minor
Type: enhancement

I have been running into a bug due to `parquet-format` and `parquet-format-structures` both defining the `org.apache.parquet.format.Util` class but doing so inconsistently. Examples of this are several methods which include a...

Priority: Major
Type: bug

Currently, our Parquet can use BloomFilter for any physical types. However, when BloomFilter apply on float: 1. What does +0 -0 means? Are they equal? 1. Should qNaN sNaN written...

Priority: Major
Type: enhancement

Each Instance of ColumnFilterPredicate stores the filter values in toString variable eagerly. Which is not useful ```java static abstract class ColumnFilterPredicate implements FilterPredicate, Serializable { private final Column column; private...

Priority: Critical
Type: bug

[Gorilla](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf) is a de facto encoding algorithm for float numbers, it has been used by many time series database such as InfluxDB, TimescaleDB for a while. For now Parquet only...

Priority: Major
Type: enhancement

I often needs to create tens of milliions of small dataframes and save them into parquet files. all these dataframes have the same column and index information. and normally they...

Priority: Major
Type: enhancement