parquet-format
parquet-format copied to clipboard
[T2] Wide column metadata improvemnts
- Make
ColumnMetaData.typeoptional - Make
ColumnMetaData.path_in_schemaoptional - Add
ColumnMetaData.schema_index. This is the ordinal inFileMetaData.schemathis column corresponds to. This allows sparse representation of columns in a rowgroup. - Deprecate
ColumnMetaData.encoding_statsand replace withColumnMetaData.is_fully_dict_encoded.
ref Parquet Metadata evolution
Jira
- [ ] My PR addresses the following Parquet Jira issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
- https://issues.apache.org/jira/browse/PARQUET-XXX
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
Commits
- [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
- Subject is separated from body by a blank line
- Subject is limited to 50 characters (not including Jira issue reference)
- Subject does not end with a period
- Subject uses the imperative mood ("add", not "adding")
- Body wraps at 72 characters
- Body explains "what" and "why", not "how"
Documentation
- [ ] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain Javadoc that explain what it does