datafusion
datafusion copied to clipboard
More projection pushdown for HashJoinExec
Is your feature request related to a problem or challenge?
When I try to implement #6768, I find that the current pushdown on hashjoinexec needs to satisfy strict condition.
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L1144-L1157
This function requires that the column in the left table is in front of the column in the right table the schema of ProjectionExec
.
So , it works fine for sql like SELECT t1.c as c_from_left, t1.b as b_from_left, t1.a as a_from_left, t2.a as a_from_right, t2.c as c_from_right FROM t1 JOIN t2 ON t1.b = t2.c WHERE t1.b - (1 + t2.a) <= t2.a + t1.c
but won't optimize sql like SELECT t1.c as c_from_left, t1.b as b_from_left, t2.a as a_from_right, t1.a as a_from_left, t2.c as c_from_right FROM t1 JOIN t2 ON t1.b = t2.c WHERE t1.b - (1 + t2.a) <= t2.a + t1.c
. (just reorder columns)
I added a unit test here. https://github.com/apache/arrow-datafusion/compare/main...my-vegetable-has-exploded:hashjoin-pushdown-test?expand=1 (It won't optimize the process)
But it seems that the logical optimizer can hold it rightly, so I don't know to whether it matters.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
@berkaysynnada Please take a look when you are free, Thanks a lot.
This issue will be resolved as well when I launch the new projection optimizer rule. You can see the details here https://github.com/apache/arrow-datafusion/issues/9111. If you have any question or suggestion, please feel free to ask.