parquet-format icon indicating copy to clipboard operation
parquet-format copied to clipboard

[T2] Wide column metadata improvemnts

Open alkis opened this issue 1 year ago • 0 comments

  1. Make ColumnMetaData.type optional
  2. Make ColumnMetaData.path_in_schema optional
  3. Add ColumnMetaData.schema_index. This is the ordinal in FileMetaData.schema this column corresponds to. This allows sparse representation of columns in a rowgroup.
  4. Deprecate ColumnMetaData.encoding_stats and replace with ColumnMetaData.is_fully_dict_encoded.

ref Parquet Metadata evolution

Jira

  • [ ] My PR addresses the following Parquet Jira issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
    • https://issues.apache.org/jira/browse/PARQUET-XXX
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Commits

  • [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [ ] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

alkis avatar May 29 '24 19:05 alkis