parquet-java
parquet-java copied to clipboard
Parquet encryption master
Parquet Encryption supports columnar level access control.
We can encrypt data at column level, It could selectively encrypt only a subset of columns. If you want a column to encrypt you should make that column name to start with 'encrypted_' (not case sensitive)
thanks @krishnaprasadas, I see this is still work in progress. Some heads up: To contribute this we'd need to clarify in parquet-format how this works. What encryption algorithm is used? How is the key retrieved? this should be documented so that it can be implemented in java and C++ from the spec.
thank you @julienledem for your interest. Following is a brief about what I have done. On checking the ColumnDiscriptor ColumnChunkPageWriter will create a BytesEncryptor if needed (based on 'encrypted_' tag is present in the field names or not). If there is a column to be encrypted the pageeWriter will encrypt the compressed bytes and stores the encrypted bytes instead of compressed bytes. Also the encryptedSize is stored in the header. On reading the data ParquetFileReader checks for the encrypted page size is greater than zero or not, if yes decrypt the data and return the compressed result for further processing. Currently I'm using AES encryption and the key will be stored in JKS. Also the key-password need to be available in the classpath as a property file in the file system. So that I have added "encrypted_size" in parquet-format project. Please correct me If something wrong.