ibis bug: MSSQL syntax error

What happened?

Nested mutate() generates invalid SQL query. See reprex adapted from https://github.com/posit-dev/pointblank/blob/main/pointblank/_interrogation.py#L91-L152:

# %%
import ibis
import pandas as pd

ibis.options.interactive = True

# dataframe
data = {"col1": [1, 2, 3], "col2": ["A", "B", "C"]}
df = pd.DataFrame(data)

# mssql con
con = ibis.mssql.connect(
    user=input("User: "),
    password=input("Password: "),
    host=input("Host: "),
    driver="SQL Server",
    database=input("db: ")
)

con.create_table("test_table", df)
t = con.table("test_table")

na_pass = False
column = "col1"
compare = 0

# %%
tbl = t.mutate(
    pb_is_good_1=t[column].isnull() & ibis.literal(na_pass),
    pb_is_good_2=t[column] > ibis.literal(compare),
)

tbl = tbl.mutate(pb_is_good_2=ibis.ifelse(tbl.pb_is_good_2.notnull(), tbl.pb_is_good_2, False))

result_tbl = tbl.mutate(pb_is_good_=tbl.pb_is_good_1 | tbl.pb_is_good_2).drop(
    "pb_is_good_1", "pb_is_good_2"
)

sql_query = ibis.to_sql(result_tbl)
print(sql_query)
# %%

Is this a bug or misuse of mutate()?

What version of ibis are you using?

10.3.1

What backend(s) are you using, if any?

MSSQL

Relevant log output

Here's SQL generated:


SELECT
  [t1].[col1],
  [t1].[col2],
  IIF([t1].[pb_is_good_1] <> 0 OR [t1].[pb_is_good_2] <> 0, 1, 0) AS [pb_is_good_]
FROM (
  SELECT
    [t0].[col1],
    [t0].[col2],
    (
      [t0].[col1] IS NULL -------------------------------------- Invalid syntax
    ) AND (1 = 0) AS [pb_is_good_1],---------------------------- Invalid syntax
    IIF((
      [t0].[col1] > 0
    ) IS NOT NULL, [t0].[col1] > 0, (1 = 0)) AS [pb_is_good_2]
  FROM [test_table] AS [t0]
) AS [t1]

Code of Conduct

[x] I agree to follow this project's Code of Conduct

Mar 19 '25 19:03 albersonmiranda

Hey @albersonmiranda -- can you confirm that Ibis version number? ~3.3.0 is quite old~ I don't believe there is a version 3.3.0

Mar 19 '25 21:03 gforsyth

@gforsyth I'm sorry, user mistake (I've previously installed ibis). Version is 2.1.1. Error still occurs in a fresh venv.

Mar 19 '25 23:03 albersonmiranda

@albersonmiranda -- that version is several years old. Current release is 10.3.1 -- is there something in your environment preventing an upgrade?

Mar 20 '25 12:03 gforsyth

Took from wrong venv again... @gforsyth here's correct output:

(.venv) PS C:\Users\030075675\Documents\bees_projects\ghm_banestes> python -V
Python 3.12.7
(.venv) PS C:\Users\030075675\Documents\bees_projects\ghm_banestes> pip show ibis-framework
Name: ibis-framework
Version: 10.3.1
Summary: The portable Python dataframe library
Home-page: https://ibis-project.org
Author:
Author-email: Ibis Maintainers <[email protected]>
License: Apache-2.0
Location: C:\Users\030075675\Documents\bees_projects\ghm_banestes\.venv\Lib\site-packages
Requires: atpublic, parsy, python-dateutil, sqlglot, toolz, typing-extensions, tzdata
Required-by:

And query again:

print(sql_query)

SELECT
  [t1].[col1],
  [t1].[col2],
  IIF([t1].[pb_is_good_1] <> 0 OR [t1].[pb_is_good_2] <> 0, 1, 0) AS [pb_is_good_]
FROM (
  SELECT
    [t0].[col1],
    [t0].[col2],
    (
      [t0].[col1] IS NULL
    ) AND (1 = 0) AS [pb_is_good_1],
    IIF((
      [t0].[col1] > 0
    ) IS NOT NULL, [t0].[col1] > 0, (1 = 0)) AS [pb_is_good_2]
  FROM [test_table] AS [t0]
) AS [t1]

Mar 20 '25 12:03 albersonmiranda

Thanks @albersonmiranda ! This does look like a bug and a moderately gnarly one.

I don't think your usage is incorrect.

A simplified example shows that we're initially doing the right thing:

[ins] In [42]: t = ibis.table({'a': 'str', 'b': 'str'})

[ins] In [43]: expr = t.mutate(c=t.a.isnull() & ibis.literal(False))

[nav] In [44]: ibis.to_sql(expr, dialect="mssql")
Out[44]: 
SELECT
  [t0].[a],
  [t0].[b],
  IIF((
    [t0].[a] IS NULL
  ) AND (1 = 0), 1, 0) AS [c]
FROM [unbound_table_3] AS [t0]

but when we then use an ifelse and generate a subquery, but the syntax inside the subquery is incorrect.

[ins] In [45]: expr2 = expr.mutate(d=ibis.ifelse(expr.c.notnull(), expr.c, False))

[ins] In [46]: ibis.to_sql(expr2, dialect="mssql")
Out[46]: 
SELECT
  [t1].[a],
  [t1].[b],
  IIF([t1].[c] <> 0, 1, 0) AS [c],
  IIF(IIF([t1].[c] IS NOT NULL, [t1].[c], (1 = 0)), 1, 0) AS [d]
FROM (
  SELECT
    [t0].[a],
    [t0].[b],
    (
      [t0].[a] IS NULL
    ) AND (1 = 0) AS [c]
  FROM [unbound_table_3] AS [t0]
) AS [t1]

Mar 20 '25 13:03 gforsyth

@gforsyth I see you've got a pretty big backlog. Any pointers on which file the bug might be in, so I can take a shot at fixing it?

Mar 20 '25 15:03 albersonmiranda

@albersonmiranda -- I think this is likely a pretty deep rabbit hole. I took a poke at it, and in the MSSQL compiler, we call _to_sqlglot to convert the ibis expression to a sqlglot expression:

https://github.com/ibis-project/ibis/blob/main/ibis/backends/sql/compilers/mssql.py#L159-L178

as a part of that, we run this conversion code:

        conversions = {
            name: ibis.ifelse(table_expr[name], 1, 0).cast(dt.boolean)
            for name, typ in table_expr.schema().items()
            if typ.is_boolean()
        }

to convert any boolean types to explicit ifelse because mssql has terrible handling for booleans.

I don't think the issue is actually here, but if we compare the table expression after those conversions are applied, there's an indication of the problem:

r0 := UnboundTable: unbound_table_0
  a string
  b string

r1 := Project[r0]
  a: r0.a
  b: r0.b
  c: IsNull(r0.a) & False

r2 := Project[r1]
  a: r1.a
  b: r1.b
  c: r1.c
  d: IfElse(bool_expr=NotNull(r1.c), true_expr=r1.c, false_null_expr=False)

Project[r2]
  a: r2.a
  b: r2.b
  c: Cast(IfElse(bool_expr=r2.c, true_expr=1, false_null_expr=0), to=boolean)
  d: Cast(IfElse(bool_expr=r2.d, true_expr=1, false_null_expr=0), to=boolean)

Note that the final projection (r2) includes Cast(..., to=boolean) around the IfElse and that is being correctly transpiled -- however, the IfElse inside the subquery (r1) is not explicitly cast to a bool.

I'm fairly sure that's the problem -- I suspect we may need an explicit rule in the mssql compiler to handle the IfElse node and ensure that it is wrapped in an explicit cast.

Mar 20 '25 23:03 gforsyth

@gforsyth Thanks for the detailed instructions! I'm getting close but not there yet:

def visit_IfElse(self, op, *, bool_expr, true_expr, false_null_expr):
        if isinstance(bool_expr, sge.Boolean):
            processed_bool_expr = bool_expr
        else:
            is_comparison = isinstance(bool_expr, (sge.Is, sge.EQ, sge.NEQ, 
                                                sge.GT, sge.LT, sge.GTE, sge.LTE))
            processed_bool_expr = bool_expr if is_comparison else bool_expr.neq(0)

        if op.true_expr.dtype.is_boolean():
            processed_true_expr = true_expr if isinstance(true_expr, (sge.Literal, int)) else self.if_(true_expr, 1, 0)
        else:
            processed_true_expr = true_expr

        if false_null_expr is not None and op.false_null_expr is not None and op.false_null_expr.dtype.is_boolean():
            processed_false_expr = false_null_expr if isinstance(false_null_expr, (sge.Literal, int)) else self.if_(false_null_expr, 1, 0)
        else:
            processed_false_expr = false_null_expr

        return sge.If(
            this=processed_bool_expr,
            true=processed_true_expr,
            false=processed_false_expr,
        )
    
    def visit_And(self, op, *, left, right):
        result = super().visit_And(op, left=left, right=right)
        return self.if_(result, 1, 0)

    def visit_Or(self, op, *, left, right):
        result = super().visit_Or(op, left=left, right=right)
        return self.if_(result, 1, 0)

    def visit_IsNull(self, op, *, arg):
        return self.if_(arg.is_(NULL), 1, 0)
    
    def visit_NotNull(self, op, *, arg):
        return self.if_(arg.is_(NULL).not_(), 1, 0)

For my example, this generates:

SELECT
  [t1].[col1],
  [t1].[col2],
  IIF(IIF([t1].[pb_is_good_1] <> 0 OR [t1].[pb_is_good_2] <> 0, 1, 0) <> 0, 1, 0) AS [pb_is_good_]
FROM (
  SELECT
    [t0].[col1],
    [t0].[col2],
    IIF(IIF([t0].[col1] IS NULL, 1, 0) AND (1 = 0), 1, 0) AS [pb_is_good_1],
    IIF(
      IIF(NOT (
        [t0].[col1] > 0
      ) IS NULL, 1, 0) <> 0,
      IIF([t0].[col1] > 0, 1, 0),
      IIF((1 = 0), 1, 0)
    ) AS [pb_is_good_2]
  FROM [test_table] AS [t0]
) AS [t1]

Mar 24 '25 03:03 albersonmiranda

@albersonmiranda -- another option that might work is to add a rewrite rule to the mssql compiler

Mar 25 '25 12:03 gforsyth

ibis ibis copied to clipboard

bug: MSSQL syntax error

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

ibis
ibis copied to clipboard