parquet-format
parquet-format copied to clipboard
Apache Parquet Format
### Describe the enhancement requested Encrypted files use three types of ordinals: row group, column, page. All three are simple local counters in both writers and readers. In addition, the...
### Rationale for this change Make docstring style consistent. ### What changes are included in this PR? 1. all comments are 80 cols wide 2. inline comments: `// inline-comment` 3....
### Rationale for this change - As can be seen on https://github.com/apache/parquet-format/issues/502 and the linked issues, the current state of Int96 is confusing. - Given there are proposed changes to...
### Rationale for this change ### What changes are included in this PR? ### Do these changes have PoC implementations? arrow-rs: https://github.com/apache/arrow-rs/issues/7686 parquet-java: https://github.com/apache/parquet-java/pull/3243 Closes #GH-502:
### Describe the enhancement requested int96 is used as the default timestamp in spark. Timestamp filters are extremely common and lack of ordering and stats limit optimization opportunities See also:...
Bumping thrift: https://github.com/apache/thrift/blob/master/CHANGES.md#0220
Add spec and type description for INTERVAL_YEAR_MONTH and INTERVAL_DAY_TIME types. This carries over the work in https://github.com/apache/parquet-format/pull/43
### Rationale for this change "to find a field with a given field" did not make sense in the context. From what I understand, since the fields are listed in...
### Describe the enhancement requested Currently, the `sorting_columns` is only defined on the `RowGroupMetadata`. This makes it hard to get the sort status of the entire file when it has...
### Describe the enhancement requested Database like ClickHouse support bloom filters on the tokens present in a String rather than the String itself. https://clickhouse.com/docs/optimize/skipping-indexes#bloom-filter-types I suggest that Apache Parquet support...