Support for `_pos` metadata column
Is your feature request related to a problem or challenge?
The spec defines metadata columns, one of which is the _pos column. This column seems instrumental for write operations positional deletes in future, as well as for having some ways for users to identify rows (combination of filepath + position). There's also a related DataFusion issue: https://github.com/apache/datafusion/issues/13261.
Describe the solution you'd like
The solution should allow building a TableScan with option to return metadata columns (in this case _pos). When that happens, the library should return batches that include this extra column. The column should preserve gaps in case of filter pushdown (and in case of merge-on-read).
Willingness to contribute
I can contribute to this feature independently
Corresponding arrow-rs issue, likely a dependency: https://github.com/apache/arrow-rs/issues/7299
Drafted a PR prototyping a solution: https://github.com/apache/iceberg-rust/pull/1791
Because the solution depends on a (not yet merged) change in arrow-rs (see discussion here and this PR), we also have to upgrade arrow-rs to 0.57.x (at least, depends what the version will be when we merge a change).
Therefore, a bunch of changes are because of upgrade to the new arrow-rs version. Note that I only upgraded iceberg crate, and not datafusion crate, which also needs an upgrade.
If anyone is interested in changes relevant to this, see mostly diff in reader.rs, and scan/mod.rs.