Possible read-after-write consistency issue with multiple schema migration steps in Iceberg tables on AWS Glue + S3
Apache Iceberg version
Pyiceberg 0.10.0 Pyiceberg-core 0.6.0
Please describe the bug 🐞
This may be a hard one to pin down but I noticed that multiple schema migration steps executed sequentially in the same update_schema context sometimes result in Exceptions like column name not found when using Iceberg tables on AWS Glue. An example:
with table.update_schema() as update:
update.rename_column("some_column", "renamed_column")
update.move_first("renamed_column") # this sometimes fails with an error
# that renamed column doesn't exist
I have not noticed it with other back-ends like SQLite, leading me to believe it is a Glue issue specifically where a write may not yet be reflected by the time of the next operation.
Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
hmmm we should definitely see if we can reproduce in CI via adding to https://github.com/apache/iceberg-python/pull/2371 (which was just merged)
Ideally we also set up integration tests for glue there as well