Optimization: make the most of Hint::AcceptsSingular when call make_scalar_function to Improve performance
Which issue does this PR close?
Closes https://github.com/apache/arrow-datafusion/issues/10053
Rationale for this change
Benchmark
overlay function
Gnuplot not found, using plotters backend
4args_with_3scalars/overlay/1024
time: [36.540 µs 36.560 µs 36.587 µs]
change: [-12.476% -12.385% -12.284%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
4args_with_3scalars/overlay/4096
time: [141.54 µs 141.65 µs 141.82 µs]
change: [-12.584% -12.425% -12.231%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
4args_with_3scalars/overlay/8192
time: [282.77 µs 283.42 µs 284.20 µs]
change: [-13.658% -13.353% -12.997%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
4args_without_scalar/overlay/1024
time: [37.682 µs 37.717 µs 37.757 µs]
change: [+1.8225% +1.9984% +2.1754%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
4args_without_scalar/overlay/4096
time: [147.68 µs 147.77 µs 147.91 µs]
change: [+2.7595% +3.0052% +3.2531%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
4 (4.00%) high mild
16 (16.00%) high severe
4args_without_scalar/overlay/8192
time: [298.84 µs 299.63 µs 300.27 µs]
change: [+1.6336% +2.0972% +2.5438%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Discussion🤔
According to the benchmark, there can be about a 12% performance improvement 🚀 when there are 3 scalar args; however, when all args are arrays, there is a 2% ~ 3% performance loss 😔. Does anyone have any ideas on how to avoid this loss?
What changes are included in this PR?
Currently, this approach has only been validated based on the overlay function. If everyone thinks it's appropriate, it can be applied to other functions as well.
Are these changes tested?
Are there any user-facing changes?
Marking as a draft to make it clear this PR isn't waiting on feedback anymore (I am trying to reduce the outstanding PR review queue)
Marking as a draft to make it clear this PR isn't waiting on feedback anymore (I am trying to reduce the outstanding PR review queue)
Is this PR not satisfactory enough for performance? Do I need to make further improvements to this PR? 🤔
Sorry @JasonLi-cn -- marking as ready for review so we can give it another look
It seems like this PR is failing a CI check: https://github.com/apache/datafusion/actions/runs/8771259487/job/24068788239?pr=10054
It seems like this PR is failing a CI check: https://github.com/apache/datafusion/actions/runs/8771259487/job/24068788239?pr=10054
I originally intended to modify the Cargo.lock after it was approved. 😅
Sorry @JasonLi-cn -- marking as ready for review so we can give it another look
It doesn't matter. For this PR, I personally think it is valuable, but in terms of code implementation, I am not sure if there is a better solution, which needs the help of friends in the community. 🙏
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.