fugue icon indicating copy to clipboard operation
fugue copied to clipboard

[NOTICE] Different SQL behaviors between Spark SQL and Sqlite on HAVING

Open goodwanghan opened this issue 3 years ago • 0 comments

These two SQLs have different behavior on HAVING

CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS b GROUP BY a HAVING (b>=7) OR (a=1)
PRINT

For spark, it will include [NULL, 7] in the result, but in SQLite, it will not

To make them consistent, here are two alternative solutions:

CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS b GROUP BY a HAVING (SUM(b)>=7) OR (a=1)
PRINT
CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS c GROUP BY a HAVING (c>=7) OR (a=1)
PRINT

Based on the definition of HAVING, the spark behavior makes more sense. But Sqlite may interpret the standard in another way. QPD follows Sqlite (I am not sure if it is a good idea)).

goodwanghan avatar Jun 08 '21 22:06 goodwanghan