datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Support BroadcastNestedLoopJoinExec

Open singhpk234 opened this issue 1 year ago • 7 comments

What is the problem the feature request solves?

Datafusion supports Cross and NestedLoop joins as well : https://docs.rs/datafusion-physical-plan/36.0.0/datafusion_physical_plan/joins/index.html

It will really nice if we can add support for it like Hash and SortMergeJoin.

Describe the potential solution

No response

Additional context

SHJ pr

  • https://github.com/apache/arrow-datafusion-comet/pull/194 SMJ pr
  • https://github.com/apache/arrow-datafusion-comet/pull/178

singhpk234 avatar Mar 12 '24 18:03 singhpk234

@viirya can i pick this up ?

singhpk234 avatar Mar 12 '24 18:03 singhpk234

Sure. I've not begun working on this yet. Which one you will work, NestedLoop or Cross Join? In Spark, they are two difference join operators, I think we should have different tickets for them instead of one.

viirya avatar Mar 12 '24 19:03 viirya

I can start with BNLJ :) !

singhpk234 avatar Mar 12 '24 19:03 singhpk234

still working on it, will publish a pr by early next week, apologies for the delay.

singhpk234 avatar Mar 28 '24 14:03 singhpk234

No problem. Thank you for working on this.

viirya avatar Mar 28 '24 16:03 viirya

Note that I found several bugs in current broadcast implementation when trying to enable broadcast by default in #213. Since BroadcastNestedLoopJoinExec uses broadcast, I suggest that you can work on the operator after #213 is merged.

viirya avatar Apr 05 '24 04:04 viirya

I suggest that you can work on the operator after https://github.com/apache/arrow-datafusion-comet/pull/213 is merged

Ack, let me rebase with this branch meanwhile

singhpk234 avatar Apr 05 '24 14:04 singhpk234