arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++][Parquet] Expand ParquetVersion enum values

Open pitrou opened this issue 1 year ago • 2 comments

Describe the enhancement requested

The latest released Parquet format version is 2.10.0, but our ParquetVersion enum only goes up to 2.6.0. We should fill in the missing values. For example, 2.8.0 adds the BYTE_STREAM_SPLIT encoding for floats.

Component(s)

C++, Parquet

pitrou avatar Feb 15 '24 17:02 pitrou

cc @jorisvandenbossche @mapleFU @wgtmac

pitrou avatar Feb 15 '24 17:02 pitrou

@pitrou I've considering this problem before, we talked about it here: https://github.com/apache/arrow/issues/35776 , I forgot this previously

mapleFU avatar Feb 17 '24 10:02 mapleFU

Hi, is it still necessary to continue with this issue? If so, I can help.

diego-ciciani01 avatar Sep 25 '25 12:09 diego-ciciani01

@diego-ciciani01 Feel free to create a PR :)

wgtmac avatar Sep 26 '25 05:09 wgtmac

Personally I think this would be a bit tricky, what would you plan to be in 2.10?

And some 2.10 files might be written with 2.6.0 now? 🤔

mapleFU avatar Sep 26 '25 05:09 mapleFU

Thanks for the feedback above. I’ve been digging a bit deeper into the issue, and I now understand why simply adding new values to the ParquetVersion enum might not be straightforward.

I think we should start by researching the exact features introduced in each version (2.7-2.10) from the Parquet spec changelog. For the version mismatch issue, we could consider adding a validation step to ensure written files declare the minimum required version for their actual features used, or something like that.

Let me know if I understood what you meant.

diego-ciciani01 avatar Sep 26 '25 17:09 diego-ciciani01

I have an in-progress draft PR up for this already, as I had assumed from the good-first-issue label that we just needed to add the version numbers in. It sounds like it's more complex than that, so if that's the case, feel free to take it over in a new PR, as I don't have the capacity to complete those extra bits.

thisisnic avatar Sep 30 '25 15:09 thisisnic