trino
trino copied to clipboard
Use "union" in partial aggregation output
Description
Since partial aggregation can be disabled at runtime, both aggregated and the input data need to be passed through to the final step. This change extends the partial aggregation output with additional columns to use for raw input and uses those columns to send raw input in case partial aggregation is disabled. An additional column contains information which set of channels should be used by the final step.
This is rebased on top of https://github.com/trinodb/trino/pull/12101. Only last commit matters here.
This is an extension of https://github.com/trinodb/trino/pull/11011 and limited implementation of https://github.com/prestodb/presto/issues/10305.
It does not work on a per-group basis but rather per operator factory instance (i.e.all operators for a given plan node id on one worker node).
It does not use row type
because there is no support currently to operate on the row type
fields during planning.
Benchmarks
tpch/tpcds benchmarks on partitioned orc sf1000, with and without https://github.com/trinodb/trino/pull/11289 (that PR adds RLE support for partitioned exchange which makes this pr more efficient) adaptive-partial-aggregation-union-rle-hasNonNullValue-var-row.pdf
Summary
There are significant gains for TPCH, especially with partitioned exchange RLE support. Duration drops even more than CPU because of a significant drop in the data sent over the network.
For TPCDS, the improvement is only visible when combined with partitioned exchange RLE support.
Is this change a fix, improvement, new feature, refactoring, or other?
improvement
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
core query engine
How would you describe this change to a non-technical end user or system administrator?
Improve performance of group by queries with that produce many distinct values
Related issues, pull requests, and links
Documentation
( ) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required. ( ) Release notes entries required with the following suggested text:
# Section
* Fix some things. ({issue}`issuenumber`)
I ran tpch/tpcds benchmarks for unpartitioned data + again for partitioned on the same code. For unpartitioned the gains are smaller + rle support does not help that much.
orc sf1000 unpartitioned adaptive-partial-aggregation-union-rle-unpart-060522.pdf
orc sf1000 partitioned adaptive-partial-aggregation-union-rle-part-060522.pdf
please rebase
I am so glad someone finally had the time to address this!
Pushed the wrong branch, now should be fine
rebased on the master
I ran tpch/tpcds benchmark on the current version (recently rebased on the master) + with push_partial_aggregation_through_join=true
.
The results are below. The improvement from this changes alone are smaller than before (this may be because of the optimized PartitionedOutputOperator
). With the rule on, tpch improvement is still at 7%. There is no improvement for tpcds.
Benchmarks_paa_union_push_pa_through_join.pdf cc @sopel39
I ran tpch/tpcds sf1k orc part with this (paa union) on top of https://github.com/trinodb/trino/pull/13573 (decimnal serde optimization) and push_partial_aggregation_through_join=true
It seems that this PR is still worth doing but it depends on push_partial_aggregation_through_join
being enabled.
It seems that this PR is still worth doing but it depends on push_partial_aggregation_through_join being enabled.
What about dec serde opt
+ push_partial_aggregation_through_join
?
What about
dec serde opt
+push_partial_aggregation_through_join
?
see row 1 in the summary
I ran another orc part sf1k for this rebased on a master with https://github.com/trinodb/trino/pull/13573. There are noticeable tpch gains (6% CPU).
One thing that is surprising is that the baseline does have the same improvement seen in https://github.com/trinodb/trino/pull/13573 and it should as it contains this change. This may be due to benchmark variability (in this ran or the others).
:wave: @lukasz-stec - this PR has become inactive. If you're still interested in working on it, please let us know.
We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.
closing as there is no interest in merging this