presto icon indicating copy to clipboard operation
presto copied to clipboard

Optimize repeat function by creating runlength block

Open feilong-liu opened this issue 1 year ago • 2 comments

Description

Currently repeat function implementation is to write the result block multiple times. However this is not efficient, since the result block is a repeat of the same value multiple times, we can create a RunLengthBlock which is more efficient, both in memory and construction cost.

Motivation and Context

To improve performance of repeat function. Also the result block created by repeat function will have small memory size, and it will be evaluated during query compilation (otherwise it will be skipped as the result block is too big https://github.com/prestodb/presto/blob/6998a7362764a73186a13a713a651ec71314c5f7/presto-main/src/main/java/com/facebook/presto/sql/planner/RowExpressionInterpreter.java#L776)

Impact

Improve performance for repeat function. repeat(1, 1000) shows 35% performance improvement.

Current:

Benchmark                          (repeatArgument)  Mode  Cnt      Score       Error  Units
BenchmarkRepeatFunction.benchmark              1000  avgt   20  60774.020 ± 14228.873  ns/op

After optimization

Benchmark                          (repeatArgument)  Mode  Cnt      Score      Error  Units
BenchmarkRepeatFunction.benchmark              1000  avgt   20  39639.608 ± 8549.023  ns/op

Test Plan

Add unit tests

Contributor checklist

  • [ ] Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • [ ] If release notes are required, they follow the release notes guidelines.
  • [ ] Adequate tests were added if applicable.
  • [ ] CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Improve repeat function to create RunLengthEncodedBlock to improve performance

feilong-liu avatar Feb 21 '24 23:02 feilong-liu

Also can we do some similar for SEQUENCE - using literal encoder?

kaikalur avatar Feb 22 '24 06:02 kaikalur

Nit: suggest changing the first word of the release note entry from "Optimize" to "Improve" following the Order of Changes in the [Release Note Guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines.

steveburnett avatar Feb 22 '24 19:02 steveburnett