iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Inconsistency in catalog.list_tables Behavior Across Python and Java: Returns Non-Iceberg Tables in Python Only

Open HonahX opened this issue 1 year ago • 4 comments

Feature Request / Improvement

I noticed that in python, hive, glue and dynamo list all tables, including non-Iceberg ones, in the namespace https://github.com/apache/iceberg-python/blob/acc934fb76aa6c6e2e32b60c8a99f9e2b2c627dd/pyiceberg/catalog/hive.py#L488-L504 https://github.com/apache/iceberg-python/blob/acc934fb76aa6c6e2e32b60c8a99f9e2b2c627dd/pyiceberg/catalog/glue.py#L584-L613

However, in java, we apply a filter to only return Iceberg tables in the given namespace: GlueCatalog.listTables HiveCatalog.listTables

I forgot if we discussed this before: Why do we choose to include non-iceberg tables in the result in python?

cc @Fokko

HonahX avatar Jan 29 '24 00:01 HonahX

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jul 28 '24 00:07 github-actions[bot]

Why do we choose to include non-iceberg tables in the result in python?

I don't think we should. Using HMS for both hive and iceberg tables is pretty common, we should filter to return only iceberg tables

kevinjqliu avatar Aug 09 '24 16:08 kevinjqliu

I'd like to work on this, if it's possible

mark-major avatar Sep 06 '24 13:09 mark-major

@mark-major sure thing, assigned to you

kevinjqliu avatar Sep 06 '24 16:09 kevinjqliu