pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: read_sql no longer works simply with SqlAlchemy selectables and a quick fix

Open machow opened this issue 4 years ago • 10 comments

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

Hey--I noticed while running siuba's SQL unit tests that queries using the modulo operator are failing.

The issue is due to a recent change for issue #34211 in pandas setting the no_parameters argument before executing via a sqlalchemy engine. This was done to allow queries like SELECT 1 % 2, but causes SqlAlchemy expressions handling % to not always work with read_sql.

Solution. Rather than executing in a special way (no_parameters), I think you want to wrap a string query in SqlAlchemy.sql.text (see here). This will allow both queries with % and the full range of SqlAlchemy expressions. WDYT?

from sqlalchemy import sql, create_engine
import pandas as pd

engine = create_engine('postgresql://postgres:@localhost:5433/postgres', echo=False)
#engine = create_engine('postgresql://USERNAME:PASSWORD@localhost:PORT/DBNAME', echo=False)

# doesn't work, original issue in pandas: 'dict' object does not support indexing
engine.execute("SELECT 1 % 2")

# works, ideal solution
engine.execute(sql.text("SELECT 1 % 2" ))

# queries below broken by no_parameters change ----
pd.read_sql(sql.text("SELECT 1 % 2"), engine)

pd.read_sql(sql.select([sql.literal(1) % sql.literal(2)]), engine)

Here's a gist of the error for the last two queries...

ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: integer %% integer
LINE 1: SELECT 1 %% 2
                 ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT 1 %% 2]
(Background on this error at: http://sqlalche.me/e/13/f405)

Full traceback in the details

``` --------------------------------------------------------------------------- UndefinedFunction Traceback (most recent call last) ~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args) 1267 self.dialect.do_execute_no_params( -> 1268 cursor, statement, context 1269 )

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py in do_execute_no_params(self, cursor, statement, context) 595 def do_execute_no_params(self, cursor, statement, context=None): --> 596 cursor.execute(statement) 597

UndefinedFunction: operator does not exist: integer %% integer LINE 1: SELECT 1 %% 2 ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

The above exception was the direct cause of the following exception:

ProgrammingError Traceback (most recent call last) in ----> 1 pd.read_sql(sql.text("SELECT 1 % 2"), engine)

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/pandas/io/sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize) 513 coerce_float=coerce_float, 514 parse_dates=parse_dates, --> 515 chunksize=chunksize, 516 ) 517

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize) 1293 args = _convert_params(sql, params) 1294 -> 1295 result = self.execute(*args) 1296 columns = result.keys() 1297

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/pandas/io/sql.py in execute(self, *args, **kwargs) 1160 """Simple passthrough to SQLAlchemy connectable""" 1161 return self.connectable.execution_options(no_parameters=True).execute( -> 1162 *args, **kwargs 1163 ) 1164

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in execute(self, statement, *multiparams, **params) 2236 2237 connection = self._contextual_connect(close_with_result=True) -> 2238 return connection.execute(statement, *multiparams, **params) 2239 2240 def scalar(self, statement, *multiparams, **params):

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in execute(self, object_, *multiparams, **params) 1012 ) 1013 else: -> 1014 return meth(self, multiparams, params) 1015 1016 def _execute_function(self, func, multiparams, params):

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params) 296 def _execute_on_connection(self, connection, multiparams, params): 297 if self.supports_execution: --> 298 return connection._execute_clauseelement(self, multiparams, params) 299 else: 300 raise exc.ObjectNotExecutableError(self)

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params) 1131 distilled_params, 1132 compiled_sql, -> 1133 distilled_params, 1134 ) 1135 if self._has_events or self.engine._has_events:

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args) 1316 except BaseException as e: 1317 self._handle_dbapi_exception( -> 1318 e, statement, parameters, cursor, context 1319 ) 1320

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in handle_dbapi_exception(self, e, statement, parameters, cursor, context) 1510 elif should_wrap: 1511 util.raise( -> 1512 sqlalchemy_exception, with_traceback=exc_info[2], from_=e 1513 ) 1514 else:

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/util/compat.py in raise_(failed resolving arguments) 176 177 try: --> 178 raise exception 179 finally: 180 # credit to

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args) 1266 if not evt_handled: 1267 self.dialect.do_execute_no_params( -> 1268 cursor, statement, context 1269 ) 1270 else:

~/Dropbox/Repo/siuba/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py in do_execute_no_params(self, cursor, statement, context) 594 595 def do_execute_no_params(self, cursor, statement, context=None): --> 596 cursor.execute(statement) 597 598 def is_disconnect(self, e, connection, cursor):

ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: integer %% integer LINE 1: SELECT 1 %% 2 ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: SELECT 1 %% 2] (Background on this error at: http://sqlalche.me/e/13/f405)


</details>

machow avatar Jul 30 '20 22:07 machow

@machow want to make a PR for this?

jbrockmendel avatar Sep 02 '20 23:09 jbrockmendel

@jbrockmendel sure! I have the changes running and passing tests in the pandas docker image. I'm noticing that the tests cover only either queries with parameters or using % as an operator.

I'll try adding two new kinds of tests:

  • queries with %% (should pass after changes)
  • queries with a combination of parameters and % as operator. (should fail before and after changes?).

Will add tests and open a PR tomorrow.

machow avatar Sep 10 '20 03:09 machow

might be closed by https://github.com/pandas-dev/pandas/pull/37534 (e.g. working on master).

jreback avatar Dec 28 '20 00:12 jreback

Hey--that PR (#37534) just changed the code in PR #34211 back, so reverted back to pandas v1.05 behavior that I listed in my PR (#36275).

Here is the table I made in that PR for reference

query v1.05 v.1.1 my PR #36275 issue
no params, % #34211 (merged)
no params, %% #35871
params, % https://github.com/psycopg/psycopg2/issues/827
params, %%
sqla declarative, % #35484

That means this code worked in v1.1, but fails now (as it did in v1.05; it's the no params, % case):

import pandas as pd
from sqlalchemy import create_engine

# 1.3.5
pd.__version__

engine = create_engine("postgresql://postgres:@localhost:5432/postgres", echo=False)

# TypeError: dict is not a sequence
pd.read_sql("SELECT 1 % 2", engine)

I'm not sure whether y'all consider this a bug or not, but pandas' behavior as it exists now IMO aligns well with sqlalchemy's API, since technically you shouldn't pass a sql string to engine.execute, but wrap it in sqlalchemy.sql.text (which is how you can support people passing SQL as a string).

machow avatar Jan 13 '22 04:01 machow

@machow I would like to take this issue, I am a new contributor. Probably need some guidance.

adejumoridwan avatar Jul 03 '23 10:07 adejumoridwan

Hey! I haven't looked at this issue for a while, and am not sure if it's intended behavior in pandas or not. I wonder if another issue with the label "good first issue" might be a bit easier to pick up?

machow avatar Jul 03 '23 15:07 machow

take

VedangPokharkar avatar Nov 21 '23 04:11 VedangPokharkar

hello, @machow , is this issue still being pursued?

VedangPokharkar avatar Nov 22 '23 05:11 VedangPokharkar

I just encountered this too. This bug prevents executing SELECT ... WHERE col like '...%...' statements.

sandervh14 avatar Jan 31 '24 08:01 sandervh14

I just encountered this too. This bug prevents executing SELECT ... WHERE col like '...%...' statements.

The fix for this is to use two %, example:

SELECT ... WHERE fullname like 'John%%' instead of SELECT ... WHERE fullname like 'John%'

Read about it here: https://stackoverflow.com/questions/64252764/sql-case-when-x-like-t-in-python-script-resulting-in-typeerror-dict-is-not

emilenstrom avatar Feb 14 '24 08:02 emilenstrom