[SPARK-49155][SQL][SS] Use more appropriate parameter type to construct `GenericArrayData`
What changes were proposed in this pull request?
Referring to the test results of GenericArrayDataBenchmark, using an Array of Any to construct GenericArrayData is more efficient compared to other scenarios:
https://github.com/apache/spark/blob/master/sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt
OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
AMD EPYC 7763 64-Core Processor
constructor: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAny 6 6 0 1620.1 0.6 1.0X
arrayOfAnyAsObject 6 6 0 1620.1 0.6 1.0X
arrayOfAnyAsSeq 155 155 1 64.7 15.5 0.0X
arrayOfInt 253 254 1 39.6 25.3 0.0X
arrayOfIntAsObject 252 253 1 39.7 25.2 0.0X
So this pr optimizes some processes of constructing GenericArrayData in Spark code:
- In
ArraysZip#evalandXPathList#nullSafeEval, the originally defined arrays of specific types are changed to data of typeAnyRefto avoid additional collection copying when constructingGenericArrayData. This is because theArray[AnyRef]type can also match thecase array: Array[Any] => arraybranch in the following code:
https://github.com/apache/spark/blob/af70aafd330fdbb6ce0d5b3efbcb180cda488695/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala#L42-L48
-
In
HistogramNumeric#eval, anIndexedSeq[InternalRow]was originally used to constructGenericArrayData. Since the length of the collection is known, it can be refactored to useArray[AnyRef]to constructGenericArrayData. -
For other cases, when constructing
GenericArrayData, the current input parameter is${input}.toArraynow. It is changed to${input}.toArray[Any]to avoid another collection copy during the construction ofGenericArrayData.
Why are the changes needed?
Using an Array of Any|AnyRef to construct GenericArrayData can improve performance by reducing collection copying.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GitHub Actions
Was this patch authored or co-authored using generative AI tooling?
No