datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Add support for `size` expression

Open andygrove opened this issue 6 months ago • 6 comments

What is the problem the feature request solves?

Add support for Spark SQL size expression:

https://spark.apache.org/docs/latest/api/sql/index.html#size

From the documentation:

size(expr) - Returns the size of an array or a map. This function returns -1 for null input only if spark.sql.ansi.enabled is false and spark.sql.legacy.sizeOfNull is true. Otherwise, it returns null for null input. With the default settings, the function returns null for null input.

Describe the potential solution

No response

Additional context

No response

andygrove avatar Jun 23 '25 23:06 andygrove

@comphead Can i pick this up ?

dharanad avatar Jun 26 '25 16:06 dharanad

Feel free, yes

comphead avatar Jun 26 '25 17:06 comphead

@comphead there is cardinality function in the DataFusion repository that I think we can reuse. I'm considering adding a new function size to datafusion-spark that aligns with Spark's semantics whilest reusing cardinality

dharanad avatar Aug 11 '25 06:08 dharanad

Here is a test that can be added to the CometFuzzTestSuite:

  test("select size of array") {
    val df = spark.read.parquet(filename)
    df.createOrReplaceTempView("t1")
    val cols = df.schema.fields.filter(_.dataType.isInstanceOf[ArrayType])
    for (col <- cols) {
      val sql = s"SELECT size(${col.name}) FROM t1 ORDER BY ${col.name}"
      if (usingDataSourceExec) {
        checkSparkAnswerAndOperator(sql)
      } else {
        checkSparkAnswer(sql)
      }
    }
  }

andygrove avatar Nov 06 '25 17:11 andygrove

Hi @dharanad. Are you still planning on working on this, or is it ok if we let someone else pick this up? Thanks!

andygrove avatar Nov 06 '25 18:11 andygrove

@andygrove I overlooked this issue, i will spend sometime over this weekend

dharanad avatar Nov 11 '25 12:11 dharanad