[suggestion] Write path optimization
Feature Request / Improvement
Let's investigate the level of abstraction on the write path.
Currently, we are doing schema-compatible checks, schema coercion, bin-packing, transformation, etc at different levels of the stack. It'll be good to optimize and see which functions can be pushed up the stack.
For example, here's what the overwrite path looks like
overwrite
_dataframe_to_data_files
write_file
write_parquet
(copied over from https://github.com/apache/iceberg-python/pull/910#pullrequestreview-2175574772)
Another example https://github.com/apache/iceberg-python/pull/786#discussion_r1646417180
More info
overwrite checks schema compatibility https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/table/init.py#L541-L550
_dataframe_to_data_files bin-packs the pyarrow Table https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/io/pyarrow.py#L2222-L2225
write_parquet transforms table schema https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/io/pyarrow.py#L2001-L2008
and
https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/io/pyarrow.py#L2011-L2021