flow
flow copied to clipboard
Unify join and joinEach behavior
Currently join and joinEach behaves a bit differently.
join is using HashJoin algorithm under the hood when joinEach due is based on a nested loop algorithm.
The problem is that the implementation of Nested Loop enforces using join_prefix because if we try to join two dataframes on id column when on both sides this column is called id we are going to get DuplicatedEntriesException coming from Rows::merge() method.
What we should do is to remove from the right dataset join columns to avoid duplicates.