Gang Wu
Gang Wu
I think lz4 and zlib have exported their packages to be used by config mode: - https://github.com/lz4/lz4/blob/dev/build/cmake/lz4Config.cmake.in - https://github.com/madler/zlib/blob/develop/CMakeLists.txt#L240
What is the difference with https://github.com/apache/parquet-java/pull/1017? cc @dongjoon-hyun
I saw a related issue: https://issues.apache.org/jira/browse/PARQUET-1648. It seems that parquet-mr does not use it yet.
I don't think so. https://github.com/apache/parquet-mr/blob/c241170d9bc2cd8415b04e06ecea40ed3d80f64d/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L1596-L1611
I agree. It should not block us from implementing this.
If possible to use C++, I think parquet-cpp in the Apache Arrow is the best solution to your case: https://arrow.apache.org/docs/cpp/parquet.html
I think conversion between parquet and arrow is a valid use case. The parquet-java provides built-in row-level interfaces like avro/thrift/protobuf. Other parquet (Java) implementations (Presto/Trino/Spark) simply leverage the page &...
> It seems that iceberg has an arrow implementation. Yes, but it does not support reading repetition levels and v2 encodings.
You may want to set these envs or cmake vars to explicitly use provided dependencies: https://github.com/apache/orc/blob/main/cmake_modules/ThirdpartyToolchain.cmake#L52-L102
Sorry that I'm a little bit overwhelmed these days. Will take a look when I get the chance. BTW, @luffy-zh is implementing exposing RowIndex positions: https://github.com/apache/orc/pull/2005. Perhaps there is an...