sqlglot
sqlglot copied to clipboard
Incorrect Parsing Between Spark SQL and DuckDB
There are a couple of mistakes when translating from Spark to DuckDB
Fully reproducible code snippet
from sqlglot import parse_one
query = """SELECT SHA2(str, 256), ANY_VALUE(str2, true), FIRST(str3, true), LAST(str4, true), TRANSFORM(array_col)
FROM table
GROUP BY group_col"""
parse_one(query, dialect="spark").sql(dialect="duckdb")
The output is:
SELECT SHA2(str, 256), ANY_VALUE(str2 IGNORE NULLS), FIRST(str3 IGNORE NULLS), LAST(str4 IGNORE NULLS), TRANSFORM(array_col) FROM "table" GROUP BY group_col
Which fails for three different reasons:
- SHA2 is not a function in duckdb - the relevant function is sha256
- TRANSFORM should be LIST_TRANSFORM
- When using non-window function, duckdb doesn't support RESPECT/IGNORE NULLS. The default behavior is without NULLs The expected output Official Documentation
- https://duckdb.org/docs/sql/functions/utility.html - SHA256
- https://duckdb.org/docs/sql/functions/list.html - LIST_TRANSFORM
- https://duckdb.org/docs/sql/functions/aggregates.html - aggregate functions