datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Optimization: make the most of Hint::AcceptsSingular when call make_scalar_function to Improve performance

Open JasonLi-cn opened this issue 1 year ago • 6 comments

Which issue does this PR close?

Closes https://github.com/apache/arrow-datafusion/issues/10053

Rationale for this change

Benchmark

overlay function

Gnuplot not found, using plotters backend
4args_with_3scalars/overlay/1024
                        time:   [36.540 µs 36.560 µs 36.587 µs]
                        change: [-12.476% -12.385% -12.284%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

4args_with_3scalars/overlay/4096
                        time:   [141.54 µs 141.65 µs 141.82 µs]
                        change: [-12.584% -12.425% -12.231%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

4args_with_3scalars/overlay/8192
                        time:   [282.77 µs 283.42 µs 284.20 µs]
                        change: [-13.658% -13.353% -12.997%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

4args_without_scalar/overlay/1024
                        time:   [37.682 µs 37.717 µs 37.757 µs]
                        change: [+1.8225% +1.9984% +2.1754%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

4args_without_scalar/overlay/4096
                        time:   [147.68 µs 147.77 µs 147.91 µs]
                        change: [+2.7595% +3.0052% +3.2531%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
  4 (4.00%) high mild
  16 (16.00%) high severe

4args_without_scalar/overlay/8192
                        time:   [298.84 µs 299.63 µs 300.27 µs]
                        change: [+1.6336% +2.0972% +2.5438%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Discussion🤔

According to the benchmark, there can be about a 12% performance improvement 🚀 when there are 3 scalar args; however, when all args are arrays, there is a 2% ~ 3% performance loss 😔. Does anyone have any ideas on how to avoid this loss?

What changes are included in this PR?

Currently, this approach has only been validated based on the overlay function. If everyone thinks it's appropriate, it can be applied to other functions as well.

Are these changes tested?

Are there any user-facing changes?

JasonLi-cn avatar Apr 12 '24 08:04 JasonLi-cn

Marking as a draft to make it clear this PR isn't waiting on feedback anymore (I am trying to reduce the outstanding PR review queue)

alamb avatar May 15 '24 19:05 alamb

Marking as a draft to make it clear this PR isn't waiting on feedback anymore (I am trying to reduce the outstanding PR review queue)

Is this PR not satisfactory enough for performance? Do I need to make further improvements to this PR? 🤔

JasonLi-cn avatar May 16 '24 04:05 JasonLi-cn

Sorry @JasonLi-cn -- marking as ready for review so we can give it another look

alamb avatar May 16 '24 18:05 alamb

It seems like this PR is failing a CI check: https://github.com/apache/datafusion/actions/runs/8771259487/job/24068788239?pr=10054

alamb avatar May 16 '24 18:05 alamb

It seems like this PR is failing a CI check: https://github.com/apache/datafusion/actions/runs/8771259487/job/24068788239?pr=10054

I originally intended to modify the Cargo.lock after it was approved. 😅

JasonLi-cn avatar May 17 '24 06:05 JasonLi-cn

Sorry @JasonLi-cn -- marking as ready for review so we can give it another look

It doesn't matter. For this PR, I personally think it is valuable, but in terms of code implementation, I am not sure if there is a better solution, which needs the help of friends in the community. 🙏

JasonLi-cn avatar May 17 '24 06:05 JasonLi-cn

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 17 '24 01:07 github-actions[bot]

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Sep 16 '24 02:09 github-actions[bot]