datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Support DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY

Open kazuyukitanimura opened this issue 1 year ago • 1 comments

What is the problem the feature request solves?

There are some tests in Spark 4.0 that uses parquet.writer.version=v2 (ParquetTypeWideningSuite).

The V2 write writes with delta encoding. Comet currently cannot read such files

Describe the potential solution

No response

Additional context

No response

kazuyukitanimura avatar Jun 14 '24 18:06 kazuyukitanimura

IIRC, the vectorized versions of these encodings in Spark did not improve performance much over the row based implementation in the parquet library

parthchandra avatar Jun 15 '24 00:06 parthchandra