python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

Running a parameterized query via a SQLAlchemy Selectable

Open rjrudin opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe.

I'm back again, after submitting #474 and #476, as we're trying to shift from using pd.read_sql_query to using pandas_gbq.read_gbq .

Current issue is that when we construct our SQLAlchemy Selectable, we may toss a row limit onto it based on user input, e.g:

sql: Select = select(literal_column("*")).select_from(...)
sql = sql.limit(50)

pands_gbq.read_gbq wants a str as its query argument. So we're trying to compile the Selectable and serialize it as a string:

query = str(sql.compile(self.engine))

However, SQLAlchemy serializes its named parameters in a way that I can't get to work with pandas_gbq (and more generally, that won't work with BigQuery):

SELECT * 
FROM `some_dataset.some_table`
LIMIT %(param_1:INT64)s

I found an example in the pandas_gbq tests of how to configure query/queryParameters in the "configuration" dict that can be passed to read_gbq, but that of course doesn't help because BigQuery has its own way of naming parameters.

Describe the solution you'd like

Generally, I need a way (hopefully it exists already and I'm just missing it!) of serializing a SQLAlchemy Selectable into a string query that can be the input to read_gbq. For now, just being able to handle the limit() parameter above would suffice. I of course am trying to avoid string concatenation and am hoping there's some way I can configure the "configuration" dict to make this work.

Describe alternatives you've considered

N/A

Additional context

N/A

rjrudin avatar Jan 25 '22 21:01 rjrudin

Not sure if this is relevant - using SQLAlchemy 1.4.25 and sqlalchemy-bigquery 1.3.0, when I run this:

from sqlalchemy_bigquery import BigQueryDialect
str(sql.compile(dialect=BigQueryDialect))

I get this error:

AttributeError: type object 'BigQueryDialect' has no attribute 'positional'

I tried using "setattr" to fake-fix that, but I get several other errors about attributes missing. Not sure if that would work, and so I've stuck with just doing compile(self.engine) or compile(bind=self.engine).

rjrudin avatar Jan 26 '22 00:01 rjrudin

If this is really an issue for https://github.com/googleapis/python-bigquery-sqlalchemy , I'm happy to move it there too.

rjrudin avatar Jan 26 '22 00:01 rjrudin

This is a tough one. I think it does belong here, as the SQLAlchemy connector needs to continue to generate SQL that can be understood by the google.cloud.bigquery.dbapi modules.

I would like to allow this library to take SQLAlchemy-generated SQL somehow, possibly via an optional dependency on SQLAlchemy so pandas-gbq can run the compile itself.

tswast avatar Feb 17 '22 23:02 tswast

I assume it's built this way because DBAPI doesn't include a BigQuery-compatible parameter substitution style, but the sqlalchemy-bigquery dialect compiles SQL into the pyformat style, then at runtime transforms the pyformatted query into a BigQuery compatible one and builds params that can be executed against a bigquery.Client. Pretty confusing, but I pulled out the relevant pieces into a function that does the conversion

from google.cloud.bigquery.dbapi._helpers import to_query_parameters
from google.cloud.bigquery.dbapi.cursor import _format_operation

query = select(...) # SQLAlchemy query
compiled = query.compile(engine)  # engine using BigQueryDialect or dialect=BigQueryDialect
parameters = compiled.params
bq_query, parameter_types = _format_operation(str(compiled), parameters)
job_config = bigquery.QueryJobConfig(
    query_parameters=to_query_parameters(parameters, parameter_types)
)
# client = bigquery.Client(...)
query_job = client.query(bq_query, job_config)

This is pretty boiled down and doesn't address any edge cases, so YMMV, but it works for the relatively small scope of queries we expect.

willsthompson avatar Aug 24 '23 20:08 willsthompson