trino icon indicating copy to clipboard operation
trino copied to clipboard

Skew join optimizer

Open damnMeddlingKid opened this issue 1 year ago • 9 comments

Overview

This PR introduces a new optimization rule to handle joins where keys have a skewed distribution (some keys are much more frequent than others) using the salted join technique.

image

here are the steps for breaking up skew in the optimizer:

  • User provides the name of the skewed column that they want to join on and the values of keys that are skewed.
  • On the left side a new column is created called randPart, for the skewed rows randPart=random(N) where N is the number of buckets to partition skew. for non-skewed rows randPart=0
  • On the right side we create N copies of rows with skewed values and assign them a skewPart of 0,1,..N respectively
  • Add an additional condition to the join clause to join on left.randPart = right.skewPart
  • Drop randPart column after the join

see https://github.com/trinodb/trino/issues/1261 for further discussion

Assumptions

  • The rule only applies to inner joins and left joins, other join types will have errors because of the duplicate rows on the right side.
  • I'm only considering partitioned joins because there won't be much benefit for replicated joins as they are fast enough.
  • For now i've made the decision to not handle the case where the skew is on the right side. but i think in the future we can generalize the optimizer to handle skew in either side.
  • The optimizer traverses the plan to find all symbols that reference skewed symbols and propagates this information through nodes. it does not propagate skewed symbols through JoinNodes, so a join on a skewed join will not be rewritten even if it references a skewed symbol. Nested joins are currently out of scope because it is possible for joins to alter the distribution of skewness. However nesting joins against a skewed fact table is a popular pattern in queries. We should tackle this in future extensions to this work.

Details of the changes

SystemSessionProperties.java

Introduces two new session variables

  • SKEWED_JOIN_METADATA: This is an array of arrays of the form Array[Array[tablename, skewedcolumn, skewedvalue, skewedvalue]]
  • SKEWED_JOIN_REPLICATION_FACTOR: The number of buckets to partition skewed rows by, defaults to 3.

SkewJoinOptimizer.java

  • SkewJoinConfig: Parses the session variables for and extracts the table, column and skewed values.
  • PlanRewriter: Traverses the plan and finds all skewed symbols that might reference the user specified skewed column. When it finds a join that uses a skewed symbol it rewrites it.
  • JoinRewriter: Rewrites skewed joins with a new join that has its left side projected with a random symbol and the right side unnested to produce duplicated rows, also modifies the join clause to include the random symbols.

Future work

  • Use a hint framework to specify the configuration rather than session variables, that way we don't need to traverse the plan to find join nodes and symbols.
  • Handle skew on the right side as well.
  • Handle nested joins.
  • We can extend a similar pattern to handle range joins, im going to be exploring this angle.

Related issues, pull requests, and links

related to: https://github.com/trinodb/trino/issues/1261

Documentation

( ) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required. ( ) Release notes entries required with the following suggested text:

# Section
* Introduced a skew join optimizer. ({issue} https://github.com/trinodb/trino/issues/1261)

cc: @amoghmargoor

damnMeddlingKid avatar Jul 29 '22 00:07 damnMeddlingKid

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Jul 29 '22 00:07 cla-bot[bot]

Submitted the signed CLA to [email protected]

damnMeddlingKid avatar Jul 29 '22 00:07 damnMeddlingKid

FYI @martint @sopel39

hashhar avatar Jul 29 '22 05:07 hashhar

cc @raunaqmorarka @skrzypo987

findepi avatar Jul 29 '22 09:07 findepi

Skewed columns need to be derived from stats, so there should be no requirement for SKEWED_JOIN_METADATA

sopel39 avatar Jul 29 '22 13:07 sopel39

Thanks @damnMeddlingKid! I've added some initial comments

sopel39 avatar Jul 29 '22 20:07 sopel39

Skewed columns need to be derived from stats, so there should be no requirement for SKEWED_JOIN_METADATA

From the original issue. @martint suggested getting the feature in with a session property so users can start using the functionality manually and over time we can build on top of it to improve the user experience by using stats. https://github.com/trinodb/trino/issues/1261#issuecomment-524572774

I think its a good approach, we can build incrementally from here.

damnMeddlingKid avatar Aug 03 '22 16:08 damnMeddlingKid

@damnMeddlingKid A session property is good to have but unless the implementation has issues which make it not ideal to be used generally I suggest we enable it by default and have the session property as a kill switch if needed. People very rarely enable features explicitly (and this also matches what we have done for other improvements in the past).

hashhar avatar Aug 03 '22 19:08 hashhar

@hashhar, it can't be enabled by default because it requires information that is not available otherwise (e.g., which columns are skewed, what values, etc). Until such information can be provided by table metadata, a session property is a reasonable approach.

martint avatar Aug 04 '22 07:08 martint

@damnMeddlingKid Can you use broadcast join for your skewed queries or build side is too big?

sopel39 avatar Aug 11 '22 08:08 sopel39

@damnMeddlingKid Can you use broadcast join for your skewed queries or build side is too big?

We are still working on collecting data from our workload to isolate skewed joins, We won't be able to use replicated joins in all cases. This rule doesn't apply if the join is replicated.

damnMeddlingKid avatar Aug 11 '22 14:08 damnMeddlingKid

Wanted to bump this, would love to get people's thoughts on the change.

damnMeddlingKid avatar Aug 29 '22 14:08 damnMeddlingKid

cc: @sopel39 @raunaqmorarka

damnMeddlingKid avatar Aug 31 '22 14:08 damnMeddlingKid

Hey folks, was wondering how to proceed here.

damnMeddlingKid avatar Sep 22 '22 21:09 damnMeddlingKid

I've removed the array constructor and fixed all the scope issues. Will look into using the record and fixing getMetadata this week.

damnMeddlingKid avatar Nov 07 '22 02:11 damnMeddlingKid

We're gradually trying to get rid of whole-plan optimizers based on PlanOptimizer in favor of incremental optimizers based on Rule. Unfortunately, this specific optimization requires propagating "traits" up the the plan, but we don't yet have a mechanism to do that in the Rule-based optimizer framework.

I think thats a good idea. We have a PR that introduces a hint framework, if we get that in then users can configure this optimization directly on the join/select that is skewed. That will make it possible for the optimizer to become a rule. Should we hold off on this till then ?.

The skewed values in the session property are encoded are "stringly-typed". This could be a source of subtle errors and would require specific knowledge about how such strings get converted to the required type at runtime.

True, i've been relying on infering the type from the source symbols and then casting. this should work for the common key types, i could maybe also allow the user to pass a sql type. what do you think ?.

damnMeddlingKid avatar Nov 07 '22 02:11 damnMeddlingKid

I've addressed all remaining comments and have two open comments. @martint could you take a look.

I'll also try and run some tests on data this week and pull some numbers.

damnMeddlingKid avatar Nov 14 '22 00:11 damnMeddlingKid

This will help in making this into iterative optimiser rule - https://github.com/trinodb/trino/issues/9498 and also remove session property usage for specifying skewed values. However even in that case there are chances of traversing same plan subtrees multiple times for different invocation of rules - provided hints are provided for more than one join in query.

amoghmargoor avatar Nov 14 '22 10:11 amoghmargoor

:wave: @damnMeddlingKid @martint - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

mosabua avatar Jan 12 '24 22:01 mosabua