datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Fix: Sort Merge Join LeftSemi issues when JoinFilter is set

Open comphead opened this issue 9 months ago • 2 comments

Which issue does this PR close?

Closes #10379 .

Rationale for this change

Fixing some existing SMJ LeftSemi bugs when join filter is set. Currently the join either crashes or giving wrong results

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

comphead avatar Apr 29 '24 23:04 comphead

I was able to fix initial query but now stuck on

Error: ArrowError(InvalidArgumentError("all columns in a record batch must have the same length"), None)

Likely related to nulls

comphead avatar Apr 29 '24 23:04 comphead

fuzztests failing...

comphead avatar May 07 '24 14:05 comphead

@viirya @alamb can I get a review on this PR please?

comphead avatar May 13 '24 14:05 comphead

I will review this today

alamb avatar May 13 '24 15:05 alamb

I'll take another look today.

viirya avatar May 13 '24 15:05 viirya

Thanks @alamb I'll add more docs and also added a task to check RightSemi join to https://github.com/apache/datafusion/issues/9846

I'll let it be opened a little longer to give @viirya more time to have a second eye on the PR

comphead avatar May 13 '24 20:05 comphead

@viirya I'm planning to merge this PR soon as it fixes the crash, and addresses your concern (please see the slt test covering this specific case). All other improvements can be in follow up PR.

comphead avatar May 20 '24 02:05 comphead

I've seen some issues in this patch. It doesn't look like a correct fix.

The tests currently in sync with what hash join returns, is there a test showing the opposite?

comphead avatar May 20 '24 15:05 comphead

I've seen some issues in this patch. It doesn't look like a correct fix.

Took another look. Looks okay to me.

viirya avatar May 20 '24 15:05 viirya

🚀

alamb avatar May 21 '24 00:05 alamb