Fix spurious warnings and bogus index when reflecting Iceberg tables
Description
Previously, during reflection, TrinoDialect.get_indexes() called _get_partitions(), which executed
SELECT * FROM {schema}."{table_name}$partitions"
and assumed that, if this query executed successfully, the table in hand was a Hive table. The issue (#518) was that this same query also succeeds for Iceberg tables, resulting in a spurious index (containing no columns) being added to the table, and a series of warnings from SQLAlchemy about index keys not being located in the column names for the table.
This PR adds a check to TrinoDialect.get_indexes() to ensure that the catalog in hand is a Hive catalog before calling _get_partitions().
I looked at creating a test method, but I couldn't see how to do so. Instead, I bench-tested the fix against Hive and Iceberg catalogs with the test app from #518. The output for an Iceberg table is:
Calling reflect()
Listing tables:
Table name: drivestats
For a Hive table it is:
Calling reflect()
Listing tables:
Table name: drivestats
Index name: partition
Column name: drivestats.year
Column name: drivestats.month
Non-technical explanation
This PR fixes spurious warnings and a bogus index being added to the metadata when reflecting Iceberg tables.
Release notes
( ) This is not user-visible or docs only and no release notes are required. ( ) Release notes are required, please propose a release note for me. (x) Release notes are required, with the following suggested text:
* Fix spurious warnings and a bogus index being added to the metadata when reflecting Iceberg tables. ({issue}`518`)
Fixes https://github.com/trinodb/trino-python-client/issues/518
I'll see if I can figure out a way to make it work on 351.
The 351 check passes now. This is the query that the code uses to check whether the catalog's connector is Hive. If it returns 1, then it's Hive.
SELECT
COUNT(*)
FROM "system"."metadata"."table_properties"
WHERE "catalog_name" = :catalog_name
AND "property_name" = 'bucketing_version'
If there's a better way to do this that passes the 351 test, I'd be happy to change the code!
for reference the earlier alternative was reading from "system"."metadata"."catalogs". I think for this case it's fine to let the warnings happen for versions which don't have system.metadata.catalogs. I don't like replying on table properties since nothing guarantees the property won't get added to Iceberg for example.
For the test then you can make it "pass" by mark.skipif (or whatever it's called).
cc: @damian3031 @ebyhr @dungdm93 what do you think?
@hashhar I realized I could query "system"."information_schema"."columns" to see if the connector_name column is there. So, the warning is shown on old versions and all tests pass without any additional @pytest.mark.skipif annotations.
Python 3.10 should be happy now... We'll see!
All tests pass, @hashhar - ready for your renewed attention 🙂
Thanks @metadaddy, I'll test locally once today and then merge. Sorry for the delays and long back and forth.
Hi @hashhar - no problem at all - as much of the delay was on my side as yours, and I think we arrived at the correct solution!
Hi @hashhar - just bumping this so it doesn't get forgotten 😉