datafusion
datafusion copied to clipboard
`StringConcat` gives inconsistent result with `concat` when containing `null`
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
❯ select concat('a', null) from t;
+------------------------+
| concat(Utf8("a"),NULL) |
+------------------------+
| a |
+------------------------+
1 row in set. Query took 0.003 seconds.
❯ select 'a' || null from t;
+-------------------+
| Utf8("a") || NULL |
+-------------------+
| |
+-------------------+
1 row in set. Query took 0.003 seconds.
❯
Expected behavior
The two queries should return the same result.
In Spark, string_concat always return null when there is null element: (on Spark3.3.0)
scala> df.selectExpr("'a' || null ").show
+---------------+
|concat(a, NULL)|
+---------------+
| null|
+---------------+
scala> df.selectExpr("concat('a' || null)").show
+-----------------------+
|concat(concat(a, NULL))|
+-----------------------+
| null|
+-----------------------+
Additional context Add any other context about the problem here.
Oh, I see in the postgresql, concat will ignore null but || won't: https://www.postgresqltutorial.com/postgresql-string-functions/postgresql-concat-function/
This is a little weird. @alamb should we follow the behavior of Spark or Postgresql?
I prefer the pg-like style because it provides one more option for users.
This is a little weird. @alamb should we follow the behavior of Spark or Postgresql?
In datafusion I think we should follow the postgresql model unless there is some compelling reason to deviate
Thank you, @doki23 @alamb. This makes sense, and I will close this issue.