Add files support for parquet field_ids
Feature Request / Improvement
Would it be possible to allow for parquet field_id support in the add_files method? Parquet field id's are a requirement for backwards compatibility with spark based tools- and also, files created by these systems would have field_ids present in the parquet files. This would allow for mass-migrations of spark generated iceberg objects without SerDe (reprocessing the parquet files) to be supported by pyiceberg.
allow for parquet field_id support in the add_files method?
this is already done automatically based on the table metadata. https://github.com/apache/iceberg-python/blob/89e71c36f26d1f3da48090ddfa137a698e2a06fc/pyiceberg/table/init.py#L855-L858
You can also specify your own name-mapping by updating the table properties
If you try to add parquet files which already have field ids you would get this error.
I think @MrDerecho is wanting a feature added so that add_files() ignores field ids that are present in existing parquet files.
I was also affected by this constraint, so I created a PR to relax it and only fail if the field IDs from the file are not compatible with the table's field IDs.