velox
velox copied to clipboard
[WIP] Add support for `parquet_writer_version` session property
Resolves https://github.com/prestodb/presto/issues/22595.
Implemented setting of version via session property:
set session hive.parquet_writer_version='PARQUET_1_0';
Though setting in Hive config does not seem to have any effect:
hive.parquet.writer.version=PARQUET_1_0
Deploy Preview for meta-velox canceled.
Name | Link |
---|---|
Latest commit | a9b4895ba9805ef4df1913a9f34479c922617c37 |
Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/663adb69ba69f200080ed98b |
Java seems to only support 1.0
and 2.0
, while C++ supports more granular versions.
https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.1/org/apache/parquet/column/ParquetProperties.WriterVersion.html
Java seems to only support
1.0
and2.0
, while C++ supports more granular versions.
There are two parquet versions and they are confusing.
- The parquet format version: https://github.com/apache/parquet-format/tags
- The parquet datapageversion (V1, V2) https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L562
The C++ parquet datapageversion is what the java parquet parquet_writer_version maps to.
This confusion was also recently discussed in the Arrow community https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx