sqlglot icon indicating copy to clipboard operation
sqlglot copied to clipboard

Incorrect Parsing Between Spark SQL and DuckDB

Open dor-bernstein opened this issue 1 year ago • 0 comments

There are a couple of mistakes when translating from Spark to DuckDB

Fully reproducible code snippet

from sqlglot import parse_one
 query = """SELECT SHA2(str, 256), ANY_VALUE(str2, true), FIRST(str3, true), LAST(str4, true), TRANSFORM(array_col)
FROM table 
GROUP BY group_col"""
 parse_one(query, dialect="spark").sql(dialect="duckdb")

The output is:

SELECT SHA2(str, 256), ANY_VALUE(str2 IGNORE NULLS), FIRST(str3 IGNORE NULLS), LAST(str4 IGNORE NULLS), TRANSFORM(array_col) FROM "table" GROUP BY group_col

Which fails for three different reasons:

  1. SHA2 is not a function in duckdb - the relevant function is sha256
  2. TRANSFORM should be LIST_TRANSFORM
  3. When using non-window function, duckdb doesn't support RESPECT/IGNORE NULLS. The default behavior is without NULLs The expected output Official Documentation
  4. https://duckdb.org/docs/sql/functions/utility.html - SHA256
  5. https://duckdb.org/docs/sql/functions/list.html - LIST_TRANSFORM
  6. https://duckdb.org/docs/sql/functions/aggregates.html - aggregate functions

dor-bernstein avatar Aug 26 '24 16:08 dor-bernstein