parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Is it possible to apply specific encodings on specific columns with ParquetWriter?

Open Selfeer opened this issue 1 year ago • 1 comments

I’m working on a tool that generates Parquet files based on a file definition provided in JSON. I use the parquet-java library for this, and I’m curious if it’s possible to specify a particular type of encoding for specific columns when generating the file.

Selfeer avatar Nov 07 '24 21:11 Selfeer

It seems that we can only control dictionary encoding and byte stream split encoding via ParquetProperties: https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java.

Other encoding types are enabled via WriterVersion: https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultValuesWriterFactory.java

wgtmac avatar Nov 15 '24 14:11 wgtmac