snowflake-sqlalchemy
snowflake-sqlalchemy copied to clipboard
Unable to specify database when declaring tables in sqlalchemy
Please answer these questions before submitting your issue. Thanks!
-
What version of Python are you using (
python --version
)?Python 3.7.3
-
What operating system and processor architecture are you using (
python -c 'import platform; print(platform.platform())'
)?Linux-4.15.0-1047-aws-x86_64-with-debian-9.11
-
What are the component versions in the environment (
pip list
)? relevant list:
snowflake-connector-python 2.0.3
snowflake-sqlalchemy 1.1.16
SQLAlchemy 1.3.10
- What did you do?
When using snowflake, our org often join tables across databases. For example, each user may have their own
dev_${USERNAME}
database instance where they test their ETL jobs, but for efficiency reasons, we keep a single copy of our prod data in the singledev
database. In this case, a user may joindev_akravetz.etl_schema.etl_table
todev.prod_replica_schema.prod_table
.
In order to use sqlalchemy's join
function, we must define the Table
object for the two tables above. The snowflake-sqlalchemy documentation does not show any examples of defining tables WITH the database. In all cases, a default database is specified. It is unclear how to specify two tables from different databases.
When defining the table objects in sqlalchemy, there are two options:
-
Full table specification (ie create a
Table
object, specifying all of theColumn
s). Here there is no way to specify the database where the table is stored. One can use the hack of specifyingTable(..., schema="database.schema", ...)
, but this seems like a bad idea? -
Table reflection using
autoload=True
. Here there is no way to specify a database given the way the connector queries the information_schema.columns table (see here).
What is the preferred way to join tables across databases?
-
What did you expect to see? N/A
-
What did you see instead? N/A
-
Can you set logging to DEBUG and collect the logs? N/A
Hi @akravetz,
If you are looking to use multiple databases, here is a way to configure it with sqlalchemy : https://flask-sqlalchemy.palletsprojects.com/en/2.x/binds/.
__bind_key__ = 'MY_DATABASE'
specifies the database you are using for your model
Dusty
Hi @akravetz,
If you are looking to use multiple databases, here is a way to configure it with sqlalchemy : https://flask-sqlalchemy.palletsprojects.com/en/2.x/binds/.
__bind_key__ = 'MY_DATABASE'
specifies the database you are using for your modelDusty
Hi Dusty, thank you for the quick reply! From reading through the document you linked to and the underlying source code, it looks like flask-sqlalchemy uses some custom code to create a new engine for each database. Since tables from different schemas are actually bound to different engines, you can't actually use tables created with different __bind_key__
's together. Let me know if I'm missing something!
Hey all--it seems like sqlalchemy-bigquery resolved this issue, by allowing a "fully qualified" schema name, like <database>.<schema>
.
See these docs: https://github.com/googleapis/python-bigquery-sqlalchemy#table-names
Here's the relevant code:
# If neither dataset nor project are the default
sample_table_1 = Table('natality', schema='bigquery-public-data.samples')
It appears the snowflake dialect just prepends the default database to the specified schema, so it's impossible to point at the non-default database...
@akravetz @machow Were either of you ever able to resolve this or find a workaround? I'd love to be able to access multiple DBs within one query using sqlalchemy to define the query.
@akravetz @machow Were either of you ever able to resolve this or find a workaround? I'd love to be able to access multiple DBs within one query using sqlalchemy to define the query.
I was not, we opted to avoid sqlalchemy since migrating to snowflake
To clean up and re-prioritize bugs and feature requests we are closing all issues older than 6 months as of Apr 1, 2023. If there are any issues or feature requests that you would like us to address, please re-create them. For urgent issues, opening a support case with this link Snowflake Community is the fastest way to get a response