iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Further refactor Parquet readers for v2 support

Open eric-maynard opened this issue 7 months ago • 3 comments

In issues like #7162 and #11371, it's reported that newer Parquet encodings like DELTA_BINARY_PACKED don't work with the current Parquet readers. #11661 recently refactored the Parquet readers to improve code re-use, but there a few more changes needed to prepare us for Parquet v2 support.

This refactor introduces a new interface VectorizedValuesReader and changes readers like TimestampMillisReader to work with this new type. After this change, new implementations of VectorizedValuesReader can be added to support encodings like DELTA_BINARY_PACKED.


This PR is a revival of @wgtmac's #9772, which based on our conversion he will not be able to continue work on. Thanks for the great work, @wgtmac.

eric-maynard avatar Jun 10 '25 17:06 eric-maynard

I have some small questions about the Roadmap for where we go from here but this makes sense to me as a first step. As long as we are more or less copying the Spark approach I think we are probably safe here. @huaxingao Could you do a quick check as well?

RussellSpitzer avatar Jun 17 '25 22:06 RussellSpitzer

Some weird rebase happened here, git history looks scary now :)

RussellSpitzer avatar Jun 18 '25 16:06 RussellSpitzer

@eric-maynard Thanks for the PR! The approach looks good to me and seems like a reasonable first step.

huaxingao avatar Jun 19 '25 00:06 huaxingao

Thanks @huaxingao! I've added Javadocs

About the scary diff @RussellSpitzer, it should be fixed but unfortunately I can't remove the tags which got auto-added when the diff was artificially massive

eric-maynard avatar Jun 23 '25 17:06 eric-maynard

@huaxingao and @wypoon do y'all have any other comment on this pr?

RussellSpitzer avatar Jun 24 '25 17:06 RussellSpitzer

Merged, Thanks @eric-maynard for the pr, and @huaxingao and @wypoon for reviewing.

RussellSpitzer avatar Jun 25 '25 19:06 RussellSpitzer

Also thanks @wgtmac for starting this!

RussellSpitzer avatar Jun 25 '25 19:06 RussellSpitzer