presto
presto copied to clipboard
Optimize repeat function by creating runlength block
Description
Currently repeat function implementation is to write the result block multiple times. However this is not efficient, since the result block is a repeat of the same value multiple times, we can create a RunLengthBlock which is more efficient, both in memory and construction cost.
Motivation and Context
To improve performance of repeat function. Also the result block created by repeat function will have small memory size, and it will be evaluated during query compilation (otherwise it will be skipped as the result block is too big https://github.com/prestodb/presto/blob/6998a7362764a73186a13a713a651ec71314c5f7/presto-main/src/main/java/com/facebook/presto/sql/planner/RowExpressionInterpreter.java#L776)
Impact
Improve performance for repeat function. repeat(1, 1000) shows 35% performance improvement.
Current:
Benchmark (repeatArgument) Mode Cnt Score Error Units
BenchmarkRepeatFunction.benchmark 1000 avgt 20 60774.020 ± 14228.873 ns/op
After optimization
Benchmark (repeatArgument) Mode Cnt Score Error Units
BenchmarkRepeatFunction.benchmark 1000 avgt 20 39639.608 ± 8549.023 ns/op
Test Plan
Add unit tests
Contributor checklist
- [ ] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
- [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
- [ ] If release notes are required, they follow the release notes guidelines.
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
Release Notes
Please follow release notes guidelines and fill in the release notes below.
== RELEASE NOTES ==
General Changes
* Improve repeat function to create RunLengthEncodedBlock to improve performance
Also can we do some similar for SEQUENCE - using literal encoder?
Nit: suggest changing the first word of the release note entry from "Optimize" to "Improve" following the Order of Changes in the [Release Note Guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines.