PyHive
PyHive copied to clipboard
metadata.reflect() not working causing `DESCRIBE default` because of possible bug in get_table_names()
The following snippet (assuming hive.thriftserver running on 10000 port)
engine = create_engine('hive://localhost:10000/default')
metadata = MetaData(bind=engine)
metadata.reflect()
causes
sqlalchemy.exc.OperationalError: (pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3,
...
errorMessage="Error running query: org.apache.spark.sql.AnalysisException: Table or view not found: default; line 1 pos 9;\n'DescribeRelation false, [col_name#7203, data_type#7204, comment#7205]\n+- 'UnresolvedTableOrView [default], DESCRIBE TABLE, true\n"), operationHandle=None)
[SQL: DESCRIBE default]
This is because of possibly incorrect implementation of get_table_names
in sqlalchemy_hive.py::376
(v0.7.1.dev0)
def get_table_names(self, connection, schema=None, **kw):
query = 'SHOW TABLES'
...
return [row[0] for row in connection.execute(text(query))]
while in row[0] in spark sql is schema name
SHOW TABLES;
+-----------+------------+--------------+--+
| database | tableName | isTemporary |
+-----------+------------+--------------+--+
| default | sam | false |
| default | sam1 | false |
| default | suj | false |
+-----------+------------+--------------+--+
so proposed fix is
def get_table_names(self, connection, schema=None, **kw):
query = 'SHOW TABLES'
if schema:
query += ' IN ' + self.identifier_preparer.quote_identifier(schema)
return [row[1] for row in connection.execute(text(query))]
Same issue here with pyhive 7.0 and sqlalchemy 2.0.