trino
trino copied to clipboard
Batch join lookup source
Handle probe side of join in batches, as opposed to row-by-row. Benchmarks in the comment below
Description
Is this change a fix, improvement, new feature, refactoring, or other?
improvement
Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
core query engine
How would you describe this change to a non-technical end user or system administrator?
Improve performance of join operator
Related issues, pull requests, and links
Documentation
(x) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required. (x) Release notes entries required with the following suggested text:
# Section
* Introduce batching to the probe side of a join operator
Benchmark-joinbatching.pdf Benchmarks for unpartitioned ORC show >8% tpch and >4% tpcds cpu gain. Partitioned benchmarks failed due to spot loss, but the partial results were similar. No visible regressions.
Could you fix https://github.com/trinodb/trino/pull/12618#discussion_r915926954 and https://github.com/trinodb/trino/pull/12618#discussion_r915927765 as a prerequisite?
I rebased the PR on top of https://github.com/trinodb/trino/pull/13432
benchmark-join-batching.pdf Newest benchmarks on top of the newest version of #13432
Could you fix https://github.com/trinodb/trino/pull/12618#discussion_r915926954 and https://github.com/trinodb/trino/pull/12618#discussion_r915927765 as a prerequisite?
ping
Final benchmarks Benchmarks batching.pdf TL;DR CPU gains 12.5% tpch, 7.5% tpcds
Can we add a test which would've caught the above issue ?
I just tried to test it and it appears that the bug would not return bad data. The only symptom is that rows containing null values may be propagated further and more rows will be eligible for a join. But those rows will no be joined anyway, since the have null values. So this will at most result in some performance degradation which is not really testable.