Further refactor Parquet readers for v2 support
In issues like #7162 and #11371, it's reported that newer Parquet encodings like DELTA_BINARY_PACKED don't work with the current Parquet readers. #11661 recently refactored the Parquet readers to improve code re-use, but there a few more changes needed to prepare us for Parquet v2 support.
This refactor introduces a new interface VectorizedValuesReader and changes readers like TimestampMillisReader to work with this new type. After this change, new implementations of VectorizedValuesReader can be added to support encodings like DELTA_BINARY_PACKED.
This PR is a revival of @wgtmac's #9772, which based on our conversion he will not be able to continue work on. Thanks for the great work, @wgtmac.
I have some small questions about the Roadmap for where we go from here but this makes sense to me as a first step. As long as we are more or less copying the Spark approach I think we are probably safe here. @huaxingao Could you do a quick check as well?
Some weird rebase happened here, git history looks scary now :)
@eric-maynard Thanks for the PR! The approach looks good to me and seems like a reasonable first step.
Thanks @huaxingao! I've added Javadocs
About the scary diff @RussellSpitzer, it should be fixed but unfortunately I can't remove the tags which got auto-added when the diff was artificially massive
@huaxingao and @wypoon do y'all have any other comment on this pr?
Merged, Thanks @eric-maynard for the pr, and @huaxingao and @wypoon for reviewing.
Also thanks @wgtmac for starting this!