ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(backends): enable cross-schema and cross-database table access where the backend supports it

Open cpcloud opened this issue 2 years ago • 7 comments

Is your feature request related to a problem?

I can't perform operations on tables that live in different schemas or databases (catalogs in some backends).

Describe the solution you'd like

I'd like to be able to operate on tables that exist in different catalogs, like doing a cross-catalog join.

What version of ibis are you running?

N/A

What backend(s) are you using, if any?

Trino

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

cpcloud avatar Jun 20 '23 15:06 cpcloud

This is also crucial for Impala. In Impala both "SHOW SCHEMAS" and "SHOW DATABASES" return the same result.

In Impala's raw SQL you'd usually do cross-schema joins by simply prepending schema name to the table name:

SELECT t1.*, t2.*
FROM schema1.tableX t1
JOIN schema2.tableY t2 
   ON t1.id=t2.id

My naive approach to resolve this problem was to create two clients, each referring to the schemas from which I want to query.

schema1 = ibis.impala.connect(database="schema1", ...)
schema2 = ibis.impala.connect(database="schema2", ...)

t1 = schema1.tables["table1"]
t2 = schema2.tables["table2"].left_join(t1, _.id==t1.id)

And that gives me: IbisError: Multiple backends found for this expression :(

contang0 avatar Sep 09 '23 23:09 contang0

Cross-schema access now works for Snowflake and duckdb using the schema attribute to table as described in #7289 🥳 This is awesome! I also tested cross-database on Snowflake where it fails:

con.table("CLASSES", schema="INFORMATION_SCHEMA", database="SNOWFLAKE")
File [~/analytics-env/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:623](https://vscode-remote+ssh-002dremote-002bg7eip0-002ddebist01.vscode-resource.vscode-cdn.net/mnt/analytics/Personal%20workspaces/Stefan%20Binder/~/analytics-env/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:623), in BaseAlchemyBackend.table(self, name, database, schema)
    619         raise com.IbisTypeError(
    620             f"`database` must be a string; got {type(database)}"
    621         )
    622     if database != self.current_database:
--> 623         return self.database(name=database).table(name=name, schema=schema)
    625 sqla_table = self._get_sqla_table(name, schema=schema)
    627 schema = self._schema_from_sqla_table(
    628     sqla_table, schema=self._schemas.get(name)
    629 )

TypeError: Database.table() got an unexpected keyword argument 'schema'

No rush, just wanted to update this issue.

binste avatar Oct 04 '23 07:10 binste

Thanks @binste!

I think to address this issue fully we're going to need to avoid sqlalchemy here, as it doesn't provide a dialect-agnostic notion of a top-level namespace (database, catalog, etc), only one for schema. Some dialects roll this into their implementations but of course that doesn't work for all of our sqlalchemy-based backends that support two levels of namespacing.

cpcloud avatar Oct 04 '23 11:10 cpcloud

You should be able to use con.table("t", schema="database.schema") (I know it's a hack 😄)

cpcloud avatar Oct 04 '23 11:10 cpcloud

:) Nice! Can confirm that I can access another database on Snowflake with this.

A bit off topic and feel free to ignore this question if it takes up too much time: Out of curiosity, why are you moving away from SQLAlchemy to sqlglot?

binste avatar Oct 04 '23 12:10 binste

Glad you asked! I lay out the primary motivations here: https://github.com/ibis-project/ibis/discussions/7213

cpcloud avatar Oct 04 '23 12:10 cpcloud

Even though the issue is still open, I already am able to do cross-schema joins on version 9.0 using Impala. Thanks a lot! Really helpful.

contang0 avatar May 03 '24 16:05 contang0