flow icon indicating copy to clipboard operation
flow copied to clipboard

Unify join and joinEach behavior

Open norberttech opened this issue 1 year ago • 0 comments

Currently join and joinEach behaves a bit differently.

join is using HashJoin algorithm under the hood when joinEach due is based on a nested loop algorithm.

The problem is that the implementation of Nested Loop enforces using join_prefix because if we try to join two dataframes on id column when on both sides this column is called id we are going to get DuplicatedEntriesException coming from Rows::merge() method.

What we should do is to remove from the right dataset join columns to avoid duplicates.

norberttech avatar Sep 07 '24 10:09 norberttech