fugue
fugue copied to clipboard
[NOTICE] Different SQL behaviors between Spark SQL and Sqlite on HAVING
These two SQLs have different behavior on HAVING
CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS b GROUP BY a HAVING (b>=7) OR (a=1)
PRINT
For spark, it will include [NULL, 7]
in the result, but in SQLite, it will not
To make them consistent, here are two alternative solutions:
CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS b GROUP BY a HAVING (SUM(b)>=7) OR (a=1)
PRINT
CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS c GROUP BY a HAVING (c>=7) OR (a=1)
PRINT
Based on the definition of HAVING
, the spark behavior makes more sense. But Sqlite may interpret the standard in another way. QPD follows Sqlite (I am not sure if it is a good idea)).