Add support for `size` expression
What is the problem the feature request solves?
Add support for Spark SQL size expression:
https://spark.apache.org/docs/latest/api/sql/index.html#size
From the documentation:
size(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.
Describe the potential solution
No response
Additional context
No response
@comphead Can i pick this up ?
Feel free, yes
@comphead there is cardinality function in the DataFusion repository that I think we can reuse. I'm considering adding a new function size to datafusion-spark that aligns with Spark's semantics whilest reusing cardinality
Here is a test that can be added to the CometFuzzTestSuite:
test("select size of array") {
val df = spark.read.parquet(filename)
df.createOrReplaceTempView("t1")
val cols = df.schema.fields.filter(_.dataType.isInstanceOf[ArrayType])
for (col <- cols) {
val sql = s"SELECT size(${col.name}) FROM t1 ORDER BY ${col.name}"
if (usingDataSourceExec) {
checkSparkAnswerAndOperator(sql)
} else {
checkSparkAnswer(sql)
}
}
}
Hi @dharanad. Are you still planning on working on this, or is it ok if we let someone else pick this up? Thanks!
@andygrove I overlooked this issue, i will spend sometime over this weekend