parquet-java
parquet-java copied to clipboard
GH-3282: Add encryption info CLI support for Parquet file encryption metadata
- Since Parquet 1.12, encryption has become a first class citizen, with support for footer and column level encryption.
- However, users have no clear way to check encryption metadata, mode, or whether footer/file is encrypted.
- This PR adds a simple, dedicated CLI command:
parquet-cli encryption-info <file> - The command reports the following:
- File-level encryption type: PLAINTEXT_FOOTER or ENCRYPTED_FOOTER.
- Summary of column encryption, per-column details and their encryption status.
cc @shangxinli @gszadovszky could you please take a look thanks!
cc @ggershinsky @shangxinli for experts on encryption
Some other details worth printing -
- is a column encrypted with the footer key or with a column-specific key?
- if all columns are encrypted with the footer key, then the file is in "uniform encryption" mode; can print this (so the user knows one key only is used in a file and can open every column)
- explicit info on the footer encryption mode - encrypted or plaintext
- optional (via a flag) printing of the key metadata of the footer key and (if available) of the column keys - can be useful for debugging key retrieval. This is binary, but maybe something similar to "hexdump -C" can be performed where some effort is made to find/print ASCII text chunks (often, key metadata has text/json parts)
- advanced debugging: print the AAD-related fields
Some other details worth printing -
- is a column encrypted with the footer key or with a column-specific key?
- if all columns are encrypted with the footer key, then the file is in "uniform encryption" mode; can print this (so the user knows one key only is used in a file and can open every column)
- explicit info on the footer encryption mode - encrypted or plaintext
- optional (via a flag) printing of the key metadata of the footer key and (if available) of the column keys - can be useful for debugging key retrieval. This is binary, but maybe something similar to "hexdump -C" can be performed where some effort is made to find/print ASCII text chunks (often, key metadata has text/json parts)
- advanced debugging: print the AAD-related fields
Thanks this is great feedback I'll iterate and update this shortly