parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

GH-3282: Add encryption info CLI support for Parquet file encryption metadata

Open ArnavBalyan opened this issue 4 months ago • 4 comments

  • Since Parquet 1.12, encryption has become a first class citizen, with support for footer and column level encryption.
  • However, users have no clear way to check encryption metadata, mode, or whether footer/file is encrypted.
  • This PR adds a simple, dedicated CLI command: parquet-cli encryption-info <file>
  • The command reports the following:
    • File-level encryption type: PLAINTEXT_FOOTER or ENCRYPTED_FOOTER.
    • Summary of column encryption, per-column details and their encryption status.

ArnavBalyan avatar Aug 25 '25 17:08 ArnavBalyan

cc @shangxinli @gszadovszky could you please take a look thanks!

ArnavBalyan avatar Aug 27 '25 06:08 ArnavBalyan

cc @ggershinsky @shangxinli for experts on encryption

wgtmac avatar Aug 28 '25 01:08 wgtmac

Some other details worth printing -

  • is a column encrypted with the footer key or with a column-specific key?
  • if all columns are encrypted with the footer key, then the file is in "uniform encryption" mode; can print this (so the user knows one key only is used in a file and can open every column)
  • explicit info on the footer encryption mode - encrypted or plaintext
  • optional (via a flag) printing of the key metadata of the footer key and (if available) of the column keys - can be useful for debugging key retrieval. This is binary, but maybe something similar to "hexdump -C" can be performed where some effort is made to find/print ASCII text chunks (often, key metadata has text/json parts)
  • advanced debugging: print the AAD-related fields

ggershinsky avatar Aug 28 '25 07:08 ggershinsky

Some other details worth printing -

  • is a column encrypted with the footer key or with a column-specific key?
  • if all columns are encrypted with the footer key, then the file is in "uniform encryption" mode; can print this (so the user knows one key only is used in a file and can open every column)
  • explicit info on the footer encryption mode - encrypted or plaintext
  • optional (via a flag) printing of the key metadata of the footer key and (if available) of the column keys - can be useful for debugging key retrieval. This is binary, but maybe something similar to "hexdump -C" can be performed where some effort is made to find/print ASCII text chunks (often, key metadata has text/json parts)
  • advanced debugging: print the AAD-related fields

Thanks this is great feedback I'll iterate and update this shortly

ArnavBalyan avatar Aug 28 '25 10:08 ArnavBalyan