spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-49506][SQL] Optimize ArrayBinarySearch for foldable array

Open panbingkun opened this issue 1 year ago • 3 comments

What changes were proposed in this pull request?

The pr aims to

  • optimize ArrayBinarySearch for foldable array.
  • fix a bug in the original implementation.

Why are the changes needed?

The changes improve performance of the array_binary_search() function.

  • create an instance of foldable{DataType}ArrayData only once at the initialization ( avoid frequent calls to ArrayData.to{DataType}Array() ), and reuse it inside of replacement in the case when the array parameter is foldable.

Before:

Running benchmark: array binary search
  Running case: no foldable optimize
  Stopped after 100 iterations, 93668 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
no foldable optimize                                916            937          24         10.9          91.6       1.0X

After:

Running benchmark: array binary search
  Running case: has foldable optimize
  Stopped after 100 iterations, 17206 ms

OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Mac OS X 14.6.1
Apple M2
array binary search:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
has foldable optimize                               164            172          22         61.1          16.4       1.0X

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Update existed UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

panbingkun avatar Sep 24 '24 11:09 panbingkun

@zhengruifeng @cloud-fan I'm very sorry that I broke the previous PR https://github.com/apache/spark/pull/47984 and couldn't restore it, so I opened this PR.

panbingkun avatar Sep 24 '24 11:09 panbingkun

cc @cloud-fan would you mind taking a look when you find some time? thanks

zhengruifeng avatar Oct 09 '24 03:10 zhengruifeng

friendly ping @cloud-fan

zhengruifeng avatar Oct 18 '24 13:10 zhengruifeng

thanks, merging to master!

cloud-fan avatar Oct 29 '24 14:10 cloud-fan

Thank you so much, @panbingkun , @cloud-fan , @zhengruifeng , @LuciferYang !

dongjoon-hyun avatar Oct 29 '24 15:10 dongjoon-hyun

Thanks all @cloud-fan @zhengruifeng @LuciferYang @dongjoon-hyun ! ❤️

panbingkun avatar Oct 30 '24 00:10 panbingkun