Gang Wu

Results 304 comments of Gang Wu

I think lz4 and zlib have exported their packages to be used by config mode: - https://github.com/lz4/lz4/blob/dev/build/cmake/lz4Config.cmake.in - https://github.com/madler/zlib/blob/develop/CMakeLists.txt#L240

What is the difference with https://github.com/apache/parquet-java/pull/1017? cc @dongjoon-hyun

I saw a related issue: https://issues.apache.org/jira/browse/PARQUET-1648. It seems that parquet-mr does not use it yet.

I don't think so. https://github.com/apache/parquet-mr/blob/c241170d9bc2cd8415b04e06ecea40ed3d80f64d/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L1596-L1611

If possible to use C++, I think parquet-cpp in the Apache Arrow is the best solution to your case: https://arrow.apache.org/docs/cpp/parquet.html

I think conversion between parquet and arrow is a valid use case. The parquet-java provides built-in row-level interfaces like avro/thrift/protobuf. Other parquet (Java) implementations (Presto/Trino/Spark) simply leverage the page &...

> It seems that iceberg has an arrow implementation. Yes, but it does not support reading repetition levels and v2 encodings.

You may want to set these envs or cmake vars to explicitly use provided dependencies: https://github.com/apache/orc/blob/main/cmake_modules/ThirdpartyToolchain.cmake#L52-L102

Sorry that I'm a little bit overwhelmed these days. Will take a look when I get the chance. BTW, @luffy-zh is implementing exposing RowIndex positions: https://github.com/apache/orc/pull/2005. Perhaps there is an...