feat(backends): enable cross-schema and cross-database table access where the backend supports it
Is your feature request related to a problem?
I can't perform operations on tables that live in different schemas or databases (catalogs in some backends).
Describe the solution you'd like
I'd like to be able to operate on tables that exist in different catalogs, like doing a cross-catalog join.
What version of ibis are you running?
N/A
What backend(s) are you using, if any?
Trino
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
This is also crucial for Impala. In Impala both "SHOW SCHEMAS" and "SHOW DATABASES" return the same result.
In Impala's raw SQL you'd usually do cross-schema joins by simply prepending schema name to the table name:
SELECT t1.*, t2.*
FROM schema1.tableX t1
JOIN schema2.tableY t2
ON t1.id=t2.id
My naive approach to resolve this problem was to create two clients, each referring to the schemas from which I want to query.
schema1 = ibis.impala.connect(database="schema1", ...)
schema2 = ibis.impala.connect(database="schema2", ...)
t1 = schema1.tables["table1"]
t2 = schema2.tables["table2"].left_join(t1, _.id==t1.id)
And that gives me: IbisError: Multiple backends found for this expression :(
Cross-schema access now works for Snowflake and duckdb using the schema attribute to table as described in #7289 🥳 This is awesome! I also tested cross-database on Snowflake where it fails:
con.table("CLASSES", schema="INFORMATION_SCHEMA", database="SNOWFLAKE")
File [~/analytics-env/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:623](https://vscode-remote+ssh-002dremote-002bg7eip0-002ddebist01.vscode-resource.vscode-cdn.net/mnt/analytics/Personal%20workspaces/Stefan%20Binder/~/analytics-env/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:623), in BaseAlchemyBackend.table(self, name, database, schema)
619 raise com.IbisTypeError(
620 f"`database` must be a string; got {type(database)}"
621 )
622 if database != self.current_database:
--> 623 return self.database(name=database).table(name=name, schema=schema)
625 sqla_table = self._get_sqla_table(name, schema=schema)
627 schema = self._schema_from_sqla_table(
628 sqla_table, schema=self._schemas.get(name)
629 )
TypeError: Database.table() got an unexpected keyword argument 'schema'
No rush, just wanted to update this issue.
Thanks @binste!
I think to address this issue fully we're going to need to avoid sqlalchemy here, as it doesn't provide a dialect-agnostic notion of a top-level namespace (database, catalog, etc), only one for schema. Some dialects roll this into their implementations but of course that doesn't work for all of our sqlalchemy-based backends that support two levels of namespacing.
You should be able to use con.table("t", schema="database.schema") (I know it's a hack 😄)
:) Nice! Can confirm that I can access another database on Snowflake with this.
A bit off topic and feel free to ignore this question if it takes up too much time: Out of curiosity, why are you moving away from SQLAlchemy to sqlglot?
Glad you asked! I lay out the primary motivations here: https://github.com/ibis-project/ibis/discussions/7213
Even though the issue is still open, I already am able to do cross-schema joins on version 9.0 using Impala. Thanks a lot! Really helpful.