ibis icon indicating copy to clipboard operation
ibis copied to clipboard

bug: when the start and end of window boundaries span across data types, ibis forces casting

Open chloeh13q opened this issue 1 year ago • 1 comments
trafficstars

What happened?

When the start and end of window boundaries span across data types, ibis forces casting in the window boundaries: https://github.com/ibis-project/ibis/blob/17be43a1081c34582408d4c627191263ec948ada/ibis/expr/builders.py#L177-L182 and this is not executable in 5 backends (with the addition of Flink streaming, which I am working on).

Steps to reproduce this error: https://github.com/ibis-project/ibis/blob/764c835598944c89cfca55802101d7effd1a6c78/ibis/backends/tests/test_window.py#L489-L520

Example:

alltypes.mutate(val=alltypes.double_col.sum().over(ibis.window(preceding=1000, following=0, group_by=[t.string_col], order_by=[t.id]))

The window boundaries here span across integer data types, and the IR generated for this involves a casting. The generated SQL query looks something like

SELECT
  SUM(`t0`.`double_col`) OVER (ORDER BY `t0`.`id` ASC NULLS LAST ROWS BETWEEN 1000 preceding AND CAST(0 AS SMALLINT) following) AS `Sum(double_col)`
FROM `alltypes` AS `t0`

and the CAST[...] following syntax is not recognized on certain backends.

The same happens when using range intervals.

What version of ibis are you using?

8.0

What backend(s) are you using, if any?

All backends

Relevant log output

# MySQL
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'CAST(0 AS SMALLINT) following) AS `val`\nFROM (\n  SELECT\n    `t0`.`id`,\n    `t...' at line 15")
# RisingWave
sql parser error: Expected literal int, found: CAST at line:15, column:122
Near " BETWEEN 1000 preceding AND CAST"
# Impala
impala.error.HiveServer2Error: AnalysisException: For ROWS window, the value of a PRECEDING/FOLLOWING offset must be a constant positive integer: CAST(0 AS SMALLINT) FOLLOWING
# MS SQL
pyodbc.ProgrammingError: ('42000', "[42000] [FreeTDS][SQL Server]Incorrect syntax near 'CAST'. (102) (SQLExecDirectW)")

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

chloeh13q avatar Feb 15 '24 19:02 chloeh13q

Wondering if the ideal solution here is to get rid of the force casting step in building windows. If we want to be more failure-proof, we can add a check in individual backends that don't support this in addition, so that an error is thrown before engine runtime

chloeh13q avatar Feb 15 '24 19:02 chloeh13q