iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Hive metastore 4.0.1 remove deprecated thrift APIs

Open mattheusv opened this issue 1 year ago • 7 comments

Apache Iceberg version

0.7.1

Please describe the bug 🐞

Starting at version 4.0.1, Hive metastore removed deprecated thrift APIs that py-iceberg is currently using. When trying to create a table with catalog.create_table_transaction using Hive metastore 4.0.1, py-iceberg raise an unexpected thrift.Thrift.TApplicationException: Invalid method name: 'get_table' error:

  File "iceberg.py", line 102, in create_table
    with self.catalog.create_table_transaction(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 289, in __exit__
    self.commit_transaction()
  File "/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 766, in commit_transaction
    self._table._do_commit(  # pylint: disable=W0212
  File "/usr/local/lib/python3.12/site-packages/pyiceberg/table/__init__.py", line 1638, in _do_commit
    response = self.catalog._commit_table(  # pylint: disable=W0212
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyiceberg/catalog/hive.py", line 457, in _commit_table
    hive_table = self._get_hive_table(open_client, database_name, table_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pyiceberg/catalog/hive.py", line 331, in _get_hive_table
    return open_client.get_table(dbname=database_name, tbl_name=table_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/hive_metastore/ThriftHiveMetastore.py", line 4242, in get_table
    return self.recv_get_table()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/hive_metastore/ThriftHiveMetastore.py", line 4260, in recv_get_table
    raise x
thrift.Thrift.TApplicationException: Invalid method name: 'get_table'

This error seems related with this Hive PR(https://github.com/apache/hive/pull/3599) that removed the get_table method

mattheusv avatar Oct 07 '24 19:10 mattheusv

Thanks for reporting this issue!

I know there's an option to set hive.hive2-compatible to be compatible with Hive 2.x https://py.iceberg.apache.org/configuration/#hive-catalog

Perhaps we need to do something similar for Hive 4.x

Do you know if there's a migration guide on interacting with Hive 4.x?

kevinjqliu avatar Oct 07 '24 21:10 kevinjqliu

Ah we use hive 4.0.0 in the integration tests

https://github.com/apache/iceberg-python/blob/24a0175d453fa50b2c40f9f2b53e53dbed3ab085/dev/docker-compose-integration.yml#L90-L92

https://github.com/apache/iceberg-python/blob/24a0175d453fa50b2c40f9f2b53e53dbed3ab085/dev/hive/Dockerfile#L26

kevinjqliu avatar Oct 07 '24 21:10 kevinjqliu

Thanks for reporting this @mattheusv

To get to the bottom of this:

  • I would first suggest bumping the Hive container to 4.0.1
  • Maybe regenerate the Hive Thrift files against the latest version, see https://github.com/apache/iceberg-python/tree/main/vendor

Let me add first-good-issue to see if anyone is interested in working on this 👍

Fokko avatar Oct 30 '24 06:10 Fokko

@Fokko I can look into this

akshayah3 avatar Nov 05 '24 13:11 akshayah3

@akshayah3 That would be great! Let me know if you run into anything!

Fokko avatar Nov 05 '24 14:11 Fokko

@akshayah3 Have you had a chance to look at this? I may be able to work on it if not

rcsmith27 avatar Mar 06 '25 14:03 rcsmith27

@rcsmith27 That would be great! As part of it, I think we also want to regenerate the HIve Thrift definitions: https://github.com/apache/iceberg-python/tree/main/vendor#hive-metastore-thrift-definition

Fokko avatar Apr 19 '25 05:04 Fokko

qq is there a way to workaround this issue? We are using hive4 metastore and would like to use pyiceberg to manage data, but blocked by this issue.

rkarthik29 avatar May 07 '25 00:05 rkarthik29

@rkarthik29 it works if you use Hive 4.0.0. I can share how we build our Docker image if you want. We had to also change some of the AWS package versions so that IRSA on EKS would work.

rcsmith27 avatar May 08 '25 10:05 rcsmith27

@rcsmith27 Are you interested in pushing a fix to PyIceberg? It would be great to get this fixed 🚀

Fokko avatar May 08 '25 12:05 Fokko

@Fokko I'd be happy to submit a fix for this. I should have time to finish it within the next two weeks.

rcsmith27 avatar May 08 '25 12:05 rcsmith27

@rcsmith27 Yes, please. It would be useful if you could share. Can it connect to the same backend database? Can I create my own metastore? I will use it mostly for reading and writing to existing tables; I will not create new tables.

rkarthik29 avatar May 09 '25 12:05 rkarthik29

@rkarthik29 it works if you use Hive 4.0.0. I can share how we build our Docker image if you want. We had to also change some of the AWS package versions so that IRSA on EKS would work.

@rcsmith27 It would be great if you can share your workaround on this thread. Maybe more of us are having this issue and a tmp solution would be great. 🙏🏼

In my case I got this error by using iceberg-kafka-connector. I've compiled it with the latest version.

yohanvalencia avatar Jul 28 '25 16:07 yohanvalencia

any update on this issue

heman026 avatar Aug 12 '25 08:08 heman026