datafusion
datafusion copied to clipboard
Fix: Sort Merge Join LeftSemi issues when JoinFilter is set
Which issue does this PR close?
Closes #10379 .
Rationale for this change
Fixing some existing SMJ LeftSemi bugs when join filter is set. Currently the join either crashes or giving wrong results
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
I was able to fix initial query but now stuck on
Error: ArrowError(InvalidArgumentError("all columns in a record batch must have the same length"), None)
Likely related to nulls
fuzztests failing...
@viirya @alamb can I get a review on this PR please?
I will review this today
I'll take another look today.
Thanks @alamb I'll add more docs and also added a task to check RightSemi join to https://github.com/apache/datafusion/issues/9846
I'll let it be opened a little longer to give @viirya more time to have a second eye on the PR
@viirya I'm planning to merge this PR soon as it fixes the crash, and addresses your concern (please see the slt test covering this specific case). All other improvements can be in follow up PR.
I've seen some issues in this patch. It doesn't look like a correct fix.
The tests currently in sync with what hash join returns, is there a test showing the opposite?
I've seen some issues in this patch. It doesn't look like a correct fix.
Took another look. Looks okay to me.
🚀