fugue [NOTICE] Different SQL behaviors between Spark SQL and Sqlite on HAVING

[NOTICE] Different SQL behaviors between Spark SQL and Sqlite on HAVING

Open goodwanghan opened this issue 3 years ago • 0 comments

These two SQLs have different behavior on HAVING

CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS b GROUP BY a HAVING (b>=7) OR (a=1)
PRINT

For spark, it will include [NULL, 7] in the result, but in SQLite, it will not

To make them consistent, here are two alternative solutions:

CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS b GROUP BY a HAVING (SUM(b)>=7) OR (a=1)
PRINT

CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int
SELECT a, SUM(b) AS c GROUP BY a HAVING (c>=7) OR (a=1)
PRINT

Based on the definition of HAVING, the spark behavior makes more sense. But Sqlite may interpret the standard in another way. QPD follows Sqlite (I am not sure if it is a good idea)).

Jun 08 '21 22:06 goodwanghan

fugue fugue copied to clipboard

[NOTICE] Different SQL behaviors between Spark SQL and Sqlite on HAVING

fugue
fugue copied to clipboard