datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Add support for SMJ with RightSemi join

Open andygrove opened this issue 7 months ago • 1 comments

What is the problem the feature request solves?

DataFusion just added support for SMJ with RightSemi join in https://github.com/apache/datafusion/pull/15972, so we should be able to support this in Comet now.

Describe the potential solution

No response

Additional context

No response

andygrove avatar May 09 '25 19:05 andygrove

I would like to take up this task.

dharanad avatar May 15 '25 19:05 dharanad

@andygrove Unlike Datafusion, Spark does not natively support RightSemi join type. This presents a challenge, and I was hoping to get your thoughts on the best way to handle this.

I was thinking of writing a rule which identifies patterns where Right Semi Join optimization can be applied

dharanad avatar Jun 12 '25 19:06 dharanad

Thanks for looking at this @dharanad. You are correct, Spark does not support this. We should probably remove RightSemi from the operator.proto file to avoid confusion:

enum JoinType {
  Inner = 0;
  LeftOuter = 1;
  RightOuter = 2;
  FullOuter = 3;
  LeftSemi = 4;
  RightSemi = 5;
  LeftAnti = 6;
  RightAnti = 7;
}

andygrove avatar Jun 16 '25 14:06 andygrove

@andygrove Sure will raise a PR to address this

dharanad avatar Jun 26 '25 14:06 dharanad