substrait icon indicating copy to clipboard operation
substrait copied to clipboard

feat: add operators to support duplicate eliminated joins

Open pdet opened this issue 6 months ago • 20 comments

This PR implements the DuplicateEliminatedGetRel and the DuplicateEliminatedJoinRel. Both relations are necessary to support duplicate eliminated joins, which is a join type necessary for unnesting arbitrary subqueries. They are introduced in-depth in the unnesting arbitrary subqueries paper.

I also have a POC PR for the DuckDB substrait repo, which already appropriately round-trips queries with the definitions proposed here.

The main question I have is if it is more desirable to have the DuplicateEliminatedJoinRel as a separate relation or if it attributes should be merged into joinrel.

For clarity/reference, in DuckDB the Duplicate Eliminated Join, is literally the LogicalComparisonJoin with a LogicalOperatorType::LOGICAL_DELIM_JOIN.

The possible join types for the logical operator type are:

LOGICAL_JOIN = 50,
LOGICAL_DELIM_JOIN = 51,
LOGICAL_COMPARISON_JOIN = 52,
LOGICAL_ANY_JOIN = 53,
LOGICAL_CROSS_PRODUCT = 54,
LOGICAL_POSITIONAL_JOIN = 55,
LOGICAL_ASOF_JOIN = 56,
LOGICAL_DEPENDENT_JOIN = 57

The comparison Join only uses the following types:

LOGICAL_DELIM_JOIN = 51,
LOGICAL_COMPARISON_JOIN = 52,
LOGICAL_ASOF_JOIN = 56,
LOGICAL_DEPENDENT_JOIN = 57

Hence, a different possibility would be to add an enum to joinrel with only LOGICAL_DELIM_JOIN and LOGICAL_COMPARISON_JOIN for now.

pdet avatar Aug 28 '24 09:08 pdet