risingwave
risingwave copied to clipboard
Optimizer: push down predicate inferred by hash join equal condition
For queries like select * from a, b on a.x = b.x and a.y = b.x, we can push down a filter before side a with condition a.x=a.y and only keep one of the two join condition in hash join executor.
Such optimization can apply to almost all kinds of hash join with the exception of anti-join.
For anti-join we can not push down a filter or keep let join key. Instead, we need to evaluate the predicate in hash join executor. If a.x=a.y is false, we know the join condition a.x = b.x and a.y = b.x is false.
This can also solve the #7698
cc. @chenzl25 @st1page
It will little more complex when we consider the NULL-safe equal condition :thinking: We can handle them respectively but I am not sure if we can handle this condition left.a is not distinct from right.b AND left.c = right.b. can we push down the condition like left.a = left.c OR left.a is null OR left.c is null?
Maybe we need to re-confirm the query requirement from the user side and if we need to handle the NULL-safe equal condition here. IIRC, it is to handle some ring detection. c.c. @lmatz
Yeah, in the original use case, a and b are different alias of the same table that is used to represent a graph, where x and y are src and dst of an edge. So I guess we can assume that x and y are both non-null in this particular use case.
Using an optimization rule to solve a panic sounds a little bit weird to me. It would be a required step to transform plans into a preferred form, rather than an optional "optimization".
It will little more complex when we consider the NULL-safe equal condition 🤔 We can handle them respectively but I am not sure if we can handle this condition
left.a is not distinct from right.b AND left.c = right.b. can we push down the condition likeleft.a = left.c OR left.a is null OR left.c is null? Maybe we need to re-confirm the query requirement from the user side and if we need to handle the NULL-safe equal condition here. IIRC, it is to handle some ring detection. c.c. @lmatz
Once null safe equal appears, we can just push down a condition like left.a is not distinct from left.c, because it it strict enough to filter out most records, except the null one. Luckily, the upper condition left.a is not distinct from right.b AND left.c = right.b can do this favor for us.
Using an optimization rule to solve a panic sounds a little bit weird to me. It would be a required step to transform plans into a preferred form, rather than an optional "optimization".
Good point. Although this is a potential way to address the problem, we decided to fix the bug by #8377 and treat this one as an optimization.
We do something similar here: https://github.com/risingwavelabs/risingwave/blob/495763326e04d2e622b3ab844db5da72c5fc5a01/src/frontend/src/optimizer/plan_node/logical_join.rs#L837
But this is implemented only for one-sided expression.
Needs futher investigation
By applying this to eq conditions, this could turn the following condition:
select * from a, b on a.x = b.x and a.y = b.x
into
select * from a, b on a.x = a.y and a.y = a.x and a.x = b.x, and a.y = b.x
NULL safety reasoning could be extended from here: https://github.com/risingwavelabs/risingwave/pull/6895. In particular: we "can only handle pushing down to an inner side.". Not sure if it is accurate when applied reflexively to eq conditions.
Hey, any updates?
Hey, any updates?
Because we solve the https://github.com/risingwavelabs/risingwave/issues/7698, it becomes less emergent and a optimization.