python-bigquery-pandas
python-bigquery-pandas copied to clipboard
Running a parameterized query via a SQLAlchemy Selectable
Is your feature request related to a problem? Please describe.
I'm back again, after submitting #474 and #476, as we're trying to shift from using pd.read_sql_query to using pandas_gbq.read_gbq .
Current issue is that when we construct our SQLAlchemy Selectable, we may toss a row limit onto it based on user input, e.g:
sql: Select = select(literal_column("*")).select_from(...)
sql = sql.limit(50)
pands_gbq.read_gbq wants a str as its query argument. So we're trying to compile the Selectable and serialize it as a string:
query = str(sql.compile(self.engine))
However, SQLAlchemy serializes its named parameters in a way that I can't get to work with pandas_gbq (and more generally, that won't work with BigQuery):
SELECT *
FROM `some_dataset.some_table`
LIMIT %(param_1:INT64)s
I found an example in the pandas_gbq tests of how to configure query/queryParameters in the "configuration" dict that can be passed to read_gbq, but that of course doesn't help because BigQuery has its own way of naming parameters.
Describe the solution you'd like
Generally, I need a way (hopefully it exists already and I'm just missing it!) of serializing a SQLAlchemy Selectable into a string query that can be the input to read_gbq. For now, just being able to handle the limit() parameter above would suffice. I of course am trying to avoid string concatenation and am hoping there's some way I can configure the "configuration" dict to make this work.
Describe alternatives you've considered
N/A
Additional context
N/A
Not sure if this is relevant - using SQLAlchemy 1.4.25 and sqlalchemy-bigquery 1.3.0, when I run this:
from sqlalchemy_bigquery import BigQueryDialect
str(sql.compile(dialect=BigQueryDialect))
I get this error:
AttributeError: type object 'BigQueryDialect' has no attribute 'positional'
I tried using "setattr" to fake-fix that, but I get several other errors about attributes missing. Not sure if that would work, and so I've stuck with just doing compile(self.engine)
or compile(bind=self.engine)
.
If this is really an issue for https://github.com/googleapis/python-bigquery-sqlalchemy , I'm happy to move it there too.
This is a tough one. I think it does belong here, as the SQLAlchemy connector needs to continue to generate SQL that can be understood by the google.cloud.bigquery.dbapi
modules.
I would like to allow this library to take SQLAlchemy-generated SQL somehow, possibly via an optional dependency on SQLAlchemy so pandas-gbq can run the compile itself.
I assume it's built this way because DBAPI doesn't include a BigQuery-compatible parameter substitution style, but the sqlalchemy-bigquery
dialect compiles SQL into the pyformat style, then at runtime transforms the pyformatted query into a BigQuery compatible one and builds params that can be executed against a bigquery.Client
. Pretty confusing, but I pulled out the relevant pieces into a function that does the conversion
from google.cloud.bigquery.dbapi._helpers import to_query_parameters
from google.cloud.bigquery.dbapi.cursor import _format_operation
query = select(...) # SQLAlchemy query
compiled = query.compile(engine) # engine using BigQueryDialect or dialect=BigQueryDialect
parameters = compiled.params
bq_query, parameter_types = _format_operation(str(compiled), parameters)
job_config = bigquery.QueryJobConfig(
query_parameters=to_query_parameters(parameters, parameter_types)
)
# client = bigquery.Client(...)
query_job = client.query(bq_query, job_config)
This is pretty boiled down and doesn't address any edge cases, so YMMV, but it works for the relatively small scope of queries we expect.