iceberg-python
iceberg-python copied to clipboard
Hive metastore 4.0.1 remove deprecated thrift APIs
Apache Iceberg version
0.7.1
Please describe the bug 🐞
Starting at version 4.0.1, Hive metastore removed deprecated thrift APIs that py-iceberg is currently using. When trying to create a table with catalog.create_table_transaction using Hive metastore 4.0.1, py-iceberg raise an unexpected thrift.Thrift.TApplicationException: Invalid method name: 'get_table' error:
File "iceberg.py", line 102, in create_table
with self.catalog.create_table_transaction(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 289, in __exit__
self.commit_transaction()
File "/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 766, in commit_transaction
self._table._do_commit( # pylint: disable=W0212
File "/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 1638, in _do_commit
response = self.catalog._commit_table( # pylint: disable=W0212
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pyiceberg/catalog/hive.py", line 457, in _commit_table
hive_table = self._get_hive_table(open_client, database_name, table_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pyiceberg/catalog/hive.py", line 331, in _get_hive_table
return open_client.get_table(dbname=database_name, tbl_name=table_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/hive_metastore/ThriftHiveMetastore.py", line 4242, in get_table
return self.recv_get_table()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/hive_metastore/ThriftHiveMetastore.py", line 4260, in recv_get_table
raise x
thrift.Thrift.TApplicationException: Invalid method name: 'get_table'
This error seems related with this Hive PR(https://github.com/apache/hive/pull/3599) that removed the get_table method
Thanks for reporting this issue!
I know there's an option to set hive.hive2-compatible to be compatible with Hive 2.x
https://py.iceberg.apache.org/configuration/#hive-catalog
Perhaps we need to do something similar for Hive 4.x
Do you know if there's a migration guide on interacting with Hive 4.x?
Ah we use hive 4.0.0 in the integration tests
https://github.com/apache/iceberg-python/blob/24a0175d453fa50b2c40f9f2b53e53dbed3ab085/dev/docker-compose-integration.yml#L90-L92
https://github.com/apache/iceberg-python/blob/24a0175d453fa50b2c40f9f2b53e53dbed3ab085/dev/hive/Dockerfile#L26
Thanks for reporting this @mattheusv
To get to the bottom of this:
- I would first suggest bumping the Hive container to 4.0.1
- Maybe regenerate the Hive Thrift files against the latest version, see https://github.com/apache/iceberg-python/tree/main/vendor
Let me add first-good-issue to see if anyone is interested in working on this 👍
@Fokko I can look into this
@akshayah3 That would be great! Let me know if you run into anything!
@akshayah3 Have you had a chance to look at this? I may be able to work on it if not
@rcsmith27 That would be great! As part of it, I think we also want to regenerate the HIve Thrift definitions: https://github.com/apache/iceberg-python/tree/main/vendor#hive-metastore-thrift-definition
qq is there a way to workaround this issue? We are using hive4 metastore and would like to use pyiceberg to manage data, but blocked by this issue.
@rkarthik29 it works if you use Hive 4.0.0. I can share how we build our Docker image if you want. We had to also change some of the AWS package versions so that IRSA on EKS would work.
@rcsmith27 Are you interested in pushing a fix to PyIceberg? It would be great to get this fixed 🚀
@Fokko I'd be happy to submit a fix for this. I should have time to finish it within the next two weeks.
@rcsmith27 Yes, please. It would be useful if you could share. Can it connect to the same backend database? Can I create my own metastore? I will use it mostly for reading and writing to existing tables; I will not create new tables.
@rkarthik29 it works if you use Hive 4.0.0. I can share how we build our Docker image if you want. We had to also change some of the AWS package versions so that IRSA on EKS would work.
@rcsmith27 It would be great if you can share your workaround on this thread. Maybe more of us are having this issue and a tmp solution would be great. 🙏🏼
In my case I got this error by using iceberg-kafka-connector. I've compiled it with the latest version.
any update on this issue