[SPARK-49506][SQL] Optimize ArrayBinarySearch for foldable array
What changes were proposed in this pull request?
The pr aims to
- optimize
ArrayBinarySearchforfoldablearray. - fix a bug in the original implementation.
Why are the changes needed?
The changes improve performance of the array_binary_search() function.
- create an instance of
foldable{DataType}ArrayDataonly once at the initialization ( avoid frequent calls toArrayData.to{DataType}Array()), and reuse it inside ofreplacementin the case when thearrayparameter is foldable.
Before:
Running benchmark: array binary search
Running case: no foldable optimize
Stopped after 100 iterations, 93668 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
no foldable optimize 916 937 24 10.9 91.6 1.0X
After:
Running benchmark: array binary search
Running case: has foldable optimize
Stopped after 100 iterations, 17206 ms
OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
has foldable optimize 164 172 22 61.1 16.4 1.0X
Does this PR introduce any user-facing change?
No.
How was this patch tested?
- Update existed UT.
- Pass GA.
Was this patch authored or co-authored using generative AI tooling?
No.
@zhengruifeng @cloud-fan I'm very sorry that I broke the previous PR https://github.com/apache/spark/pull/47984 and couldn't restore it, so I opened this PR.
cc @cloud-fan would you mind taking a look when you find some time? thanks
friendly ping @cloud-fan
thanks, merging to master!
Thank you so much, @panbingkun , @cloud-fan , @zhengruifeng , @LuciferYang !
Thanks all @cloud-fan @zhengruifeng @LuciferYang @dongjoon-hyun ! ❤️