databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

Sqlalchemy in statement is incorrectly compiled for the in clause

Open bkyryliuk opened this issue 1 year ago • 3 comments

Version 2.9.6

Repro:

from sqlalchemy import create_engine, MetaData, Table, select, column

# Assuming you have a working database connection string
engine = create_engine('...'). # make sure to specify the catalog

metadata = MetaData()
engine.execute("create table if not exists bogdankyryliuk.bool_test as select 1 as one_val, True as bool_val")
test_table = Table('bool_test', metadata, autoload_with=engine, schema='bogdankyryliuk')
stmt = select(test_table).where(test_table.c.bool_val.in_([True]))

compiled_stmt = stmt.compile(compile_kwargs={"literal_binds": True})
print(compiled_stmt)

with engine.connect() as conn:
    result = conn.execute(stmt)
    for row in result:
        print(row)

Compiled statement is printed as expected:

SELECT bogdankyryliuk.bool_test.one_val, bogdankyryliuk.bool_test.bool_val
FROM bogdankyryliuk.bool_test
WHERE bogdankyryliuk.bool_test.bool_val IN (true)

However conn.execute(stmt) executes different statement:

DatabaseError: (databricks.sql.exc.ServerOperationError) [DATATYPE_MISMATCH.DATA_DIFF_TYPES] Cannot resolve "(bool_val IN (1))" due to data type mismatch: Input to `in` should all be the same type, but it's ["BOOLEAN", "INT"]. SQLSTATE: 42K09; line 3 pos 40
[SQL: SELECT bogdankyryliuk.bool_test.one_val, bogdankyryliuk.bool_test.bool_val
FROM bogdankyryliuk.bool_test
WHERE bogdankyryliuk.bool_test.bool_val IN (%(bool_val_1_1)s)]
[parameters: {'bool_val_1_1': 1}]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

That leads to the exception

bkyryliuk avatar Sep 24 '24 20:09 bkyryliuk

It also looks like it may have been fixed in the newer versions: https://github.com/databricks/databricks-sql-python/pull/357/files it would be nice to backport it to 2.9.6

bkyryliuk avatar Sep 24 '24 20:09 bkyryliuk

Tested that fix works via setting DatabricksDialect.supports_native_boolean = True

bkyryliuk avatar Sep 24 '24 20:09 bkyryliuk

Pro-tip: one of the reasons why the compile statement looks correct is because it's using the default statement compiler, rather than the one employed by this connector. If you want to truly see what the rendered query will look like you, need to pass the Databricks dialect to your compile() call.

stmt.compile(compile_kwargs={"literal_binds": True}, dialect=DatabricksDialect())

This doesn't fix your underlying issue, but hopefully helps you chase down issues in the future.

susodapop avatar Oct 29 '24 00:10 susodapop