trino Single column bigint join round 2

Description

This is the next approach to https://github.com/trinodb/trino/pull/13178, which has been reverted due to https://github.com/trinodb/trino/issues/13380. This time the memory consumption is not so significantly increased. The additionally allocated long array is indexed by page positions, instead of hash buckets making it significantly smaller. Benchmarks show 3x less of an increase in peak memory usage of TPC benchmarks compared to https://github.com/trinodb/trino/pull/13178. The performance gain should be smaller because of CPUs out-of-order execution is less possible with this approach. However, after merging of https://github.com/trinodb/trino/pull/13352 it should not matter.

For reviewers: The only change between this PR and https://github.com/trinodb/trino/pull/13178 is the addressing of values array in BigintPagesHash class.

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

core query engine

How would you describe this change to a non-technical end user or system administrator?

Increase the performance of join on single bigint column

Related issues, pull requests, and links

Documentation

(x) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required. (x) Release notes entries required with the following suggested text:

# Section
* Improve performance of joins over a single BIGINT column

Aug 01 '22 04:08 skrzypo987

@arhimondr Can you check if this branch passes the verification that failed in https://github.com/trinodb/trino/issues/13380?

Aug 01 '22 04:08 skrzypo987

Does it still give good benchmarking results?

Benchmarks are still running, but common sense tells me that the results are going to be worse. However, after adding batched execution it should get better again.

Aug 01 '22 09:08 skrzypo987

I've added a cut-off. If the number of positions in a single PagesHash exceeds 2^19, values are not stored in a separate array. The good news is that the memory consumption remains low, probably eve slightly lower than previously. The bad news is that for every position there is an additional if which destroys any perf gain observed before. The good news is that this if is no longer a problem when we introduce batch processing (https://github.com/trinodb/trino/pull/13352). However, this PR itself does not present any gain. @sopel39

Aug 03 '22 09:08 skrzypo987

Ok, so it's tabled after https://github.com/trinodb/trino/pull/13352

Aug 03 '22 10:08 sopel39

This PR is a prerequisite for https://github.com/trinodb/trino/pull/13352 So we have a chicken and an egg problem

Aug 03 '22 10:08 skrzypo987

We can also concat those two PRs and merge as one

Aug 03 '22 10:08 skrzypo987

The cut-off now makes the join use the DefaultPagesHash, which means some perf gain is going to be visible after merging this PR

Aug 03 '22 13:08 skrzypo987

s this PR still being worked on?

Sorry about the mess. The benchmark results started going crazy at some point and I am still trying to figure out the cause. I will ping you once it is ready

Aug 09 '22 04:08 skrzypo987

Do you have newest benchmarks with lowered limit?

After running gazillion of benchmarks I finally figured out why the benchmark are so bad. Isolating both implementation of PagesHash prevents inlining and preformance regresses significantly. Unfortunately we do not know how many positions will be used at the time of isolation, so this totally ruins the cutoff feature as it is now.

Aug 09 '22 11:08 skrzypo987

@arhimondr Can you verify the second to last commit - ae20b9f372797f08659912745110923b8db0fd17. This is the one without the cut-off but should still use less memory than the one that got reverted.

Aug 09 '22 11:08 skrzypo987

benchmark-bigint-join-tpcds.pdf The tpcds benchmark finished. The version without the limit is slightly faster (~1%), but the limit 1M commit regresses about 5% cpu time. I am going to cherry-pick the queries that suck and try to find a common pattern

Aug 11 '22 05:08 skrzypo987

I added PartitionedLookupSource to isolated classes and the regression went away. The gain is minimal compared to master but there is no regression. benchmark-bigint-join-tpcds.pdf

Aug 11 '22 16:08 skrzypo987

Did some finishing (hopefully) touches:

The threshold is fixed, no config option anymore
Last two commits are squashed. The limit is introduced along with the actual change

The gains (for unpart orc) are:

tpch 6% cpu time gain
tpcds < 1% cpu time gain

Many small, cosmetic comments will be addressed in the follow-up PR

Aug 17 '22 05:08 skrzypo987

@arhimondr do we need to re-run your tests on this ? It looks good to land to me otherwise.

Aug 18 '22 05:08 raunaqmorarka

@raunaqmorarka The test run came out clean, no out of memory failures.

Aug 18 '22 14:08 arhimondr

trino trino copied to clipboard

Single column bigint join round 2

Description

Related issues, pull requests, and links

Documentation

Release notes

trino
trino copied to clipboard