parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Parquet encryption master

Open krishnaprasadas opened this issue 8 years ago • 2 comments

Parquet Encryption supports columnar level access control.

We can encrypt data at column level, It could selectively encrypt only a subset of columns. If you want a column to encrypt you should make that column name to start with 'encrypted_' (not case sensitive)

krishnaprasadas avatar Sep 15 '17 14:09 krishnaprasadas

thanks @krishnaprasadas, I see this is still work in progress. Some heads up: To contribute this we'd need to clarify in parquet-format how this works. What encryption algorithm is used? How is the key retrieved? this should be documented so that it can be implemented in java and C++ from the spec.

julienledem avatar Oct 10 '17 18:10 julienledem

thank you @julienledem for your interest. Following is a brief about what I have done. On checking the ColumnDiscriptor ColumnChunkPageWriter will create a BytesEncryptor if needed (based on 'encrypted_' tag is present in the field names or not). If there is a column to be encrypted the pageeWriter will encrypt the compressed bytes and stores the encrypted bytes instead of compressed bytes. Also the encryptedSize is stored in the header. On reading the data ParquetFileReader checks for the encrypted page size is greater than zero or not, if yes decrypt the data and return the compressed result for further processing. Currently I'm using AES encryption and the key will be stored in JKS. Also the key-password need to be available in the classpath as a property file in the file system. So that I have added "encrypted_size" in parquet-format project. Please correct me If something wrong.

krishnaprasadas avatar Oct 13 '17 07:10 krishnaprasadas