substrait
substrait copied to clipboard
feat: add operators to support duplicate eliminated joins
This PR implements the DuplicateEliminatedGetRel
and the DuplicateEliminatedJoinRel
. Both relations are necessary to support duplicate eliminated joins, which is a join type necessary for unnesting arbitrary subqueries.
They are introduced in-depth in the unnesting arbitrary subqueries paper.
I also have a POC PR for the DuckDB substrait repo, which already appropriately round-trips queries with the definitions proposed here.
The main question I have is if it is more desirable to have the DuplicateEliminatedJoinRel
as a separate relation or if it attributes should be merged into joinrel.
For clarity/reference, in DuckDB the Duplicate Eliminated Join, is literally the LogicalComparisonJoin with a LogicalOperatorType::LOGICAL_DELIM_JOIN
.
The possible join types for the logical operator type are:
LOGICAL_JOIN = 50,
LOGICAL_DELIM_JOIN = 51,
LOGICAL_COMPARISON_JOIN = 52,
LOGICAL_ANY_JOIN = 53,
LOGICAL_CROSS_PRODUCT = 54,
LOGICAL_POSITIONAL_JOIN = 55,
LOGICAL_ASOF_JOIN = 56,
LOGICAL_DEPENDENT_JOIN = 57
The comparison Join only uses the following types:
LOGICAL_DELIM_JOIN = 51,
LOGICAL_COMPARISON_JOIN = 52,
LOGICAL_ASOF_JOIN = 56,
LOGICAL_DEPENDENT_JOIN = 57
Hence, a different possibility would be to add an enum to joinrel with only LOGICAL_DELIM_JOIN
and LOGICAL_COMPARISON_JOIN
for now.