parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Is there any actual conversion implementation for arrow and parquet?

Open chenyuanxing opened this issue 1 year ago • 8 comments

We found that there are only schema conversions under the parqeut-arrow, so we wanted to ask if there is any code that includes the actual data conversions between parquet and arrow.

chenyuanxing avatar Jun 24 '24 09:06 chenyuanxing

If possible to use C++, I think parquet-cpp in the Apache Arrow is the best solution to your case: https://arrow.apache.org/docs/cpp/parquet.html

wgtmac avatar Jun 24 '24 14:06 wgtmac

Yes, We know there is a c++ implementation here, but I was wondering if there is a corresponding implementation for java, since all our code is java .

chenyuanxing avatar Jun 25 '24 05:06 chenyuanxing

parquet-arrow The library looks like it's meant to do this, But I don't know why it's always just the schema part.

chenyuanxing avatar Jun 25 '24 05:06 chenyuanxing

I think conversion between parquet and arrow is a valid use case. The parquet-java provides built-in row-level interfaces like avro/thrift/protobuf. Other parquet (Java) implementations (Presto/Trino/Spark) simply leverage the page & metadata reader/writer from this library to build extensions. Extending native arrow support would be a welcome extension to this library, IMO.

wgtmac avatar Jun 25 '24 15:06 wgtmac

So, the library parquet-arrow hasn't been used yet? because it only has schema mappings.

And We've looked at transformations in Spark, which are missing some types due to limitations in Spark, such as uint.So it's not really a universal conversion.

chenyuanxing avatar Jul 01 '24 04:07 chenyuanxing

It seems that iceberg has an arrow implementation.

doki23 avatar Jul 29 '24 02:07 doki23

It seems that iceberg has an arrow implementation.

Yes, but it does not support reading repetition levels and v2 encodings.

wgtmac avatar Jul 29 '24 04:07 wgtmac

I've ported Iceberg's implementation to parquet-arrow and removed the concepts specific to Iceberg. If someone is interested in this, they are welcome to make contributions.

doki23 avatar Aug 29 '24 06:08 doki23