feat: Allow parquet column access by field_id
This allows the the resolution of a parquet column by field_id instead of by its "path". This is a lower-level option that will not typically be used by end-users; as such, this option has not been plumbed through to python. This feature will be used in follow-up PRs in combination with Iceberg's field-ids to improve column mappings.
Writing support has also been added.
Fixes #6128
Do verify the nightlies pass before merging.
Do verify the nightlies pass before merging.
Verified.
I couldn't find any resources to confirm, but this does feel incorrect to me, having two columns with same field ID. For example, if we get a field ID by Iceberg, it would expect a single column, right?
Iceberg probably mandates the uniqueness of field-ids.
Parquet doesn't have any mandates wrt that. And even the column names aren't guaranteed to be unique. I need to find the reference I found earlier that the parquet format "strongly recommends" unique column names, but it's not even a guarantee.
There is going to be a more general follow-up to this that allows for custom logic.