datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

More projection pushdown for HashJoinExec

Open my-vegetable-has-exploded opened this issue 1 year ago • 2 comments

Is your feature request related to a problem or challenge?

When I try to implement #6768, I find that the current pushdown on hashjoinexec needs to satisfy strict condition.

https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L1144-L1157

This function requires that the column in the left table is in front of the column in the right table the schema of ProjectionExec.

Screenshot from 2024-02-14 00-22-42

So , it works fine for sql like SELECT t1.c as c_from_left, t1.b as b_from_left, t1.a as a_from_left, t2.a as a_from_right, t2.c as c_from_right FROM t1 JOIN t2 ON t1.b = t2.c WHERE t1.b - (1 + t2.a) <= t2.a + t1.c but won't optimize sql like SELECT t1.c as c_from_left, t1.b as b_from_left, t2.a as a_from_right, t1.a as a_from_left, t2.c as c_from_right FROM t1 JOIN t2 ON t1.b = t2.c WHERE t1.b - (1 + t2.a) <= t2.a + t1.c. (just reorder columns)

I added a unit test here. https://github.com/apache/arrow-datafusion/compare/main...my-vegetable-has-exploded:hashjoin-pushdown-test?expand=1 (It won't optimize the process)

But it seems that the logical optimizer can hold it rightly, so I don't know to whether it matters.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@berkaysynnada Please take a look when you are free, Thanks a lot.

This issue will be resolved as well when I launch the new projection optimizer rule. You can see the details here https://github.com/apache/arrow-datafusion/issues/9111. If you have any question or suggestion, please feel free to ask.

berkaysynnada avatar Feb 13 '24 21:02 berkaysynnada