velox icon indicating copy to clipboard operation
velox copied to clipboard

[WIP] Add support for `parquet_writer_version` session property

Open svm1 opened this issue 9 months ago • 4 comments

Resolves https://github.com/prestodb/presto/issues/22595.

Implemented setting of version via session property: set session hive.parquet_writer_version='PARQUET_1_0';

Though setting in Hive config does not seem to have any effect: hive.parquet.writer.version=PARQUET_1_0

svm1 avatar May 03 '24 05:05 svm1

Deploy Preview for meta-velox canceled.

Name Link
Latest commit a9b4895ba9805ef4df1913a9f34479c922617c37
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/663adb69ba69f200080ed98b

netlify[bot] avatar May 03 '24 05:05 netlify[bot]

Java seems to only support 1.0 and 2.0, while C++ supports more granular versions. https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.1/org/apache/parquet/column/ParquetProperties.WriterVersion.html

svm1 avatar May 03 '24 05:05 svm1

Java seems to only support 1.0 and 2.0, while C++ supports more granular versions.

There are two parquet versions and they are confusing.

  1. The parquet format version: https://github.com/apache/parquet-format/tags
  2. The parquet datapageversion (V1, V2) https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L562

The C++ parquet datapageversion is what the java parquet parquet_writer_version maps to.

majetideepak avatar May 13 '24 14:05 majetideepak

This confusion was also recently discussed in the Arrow community https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx

majetideepak avatar May 13 '24 15:05 majetideepak