distributed-dataset icon indicating copy to clipboard operation
distributed-dataset copied to clipboard

Joins

Open utdemir opened this issue 6 years ago • 2 comments
trafficstars

Implement a join function which joins two datasets based on a key/key function. Different join types (left/right/full outer, cartesian product) should be supported.

utdemir avatar Apr 14 '19 03:04 utdemir

The generalised joins in discrimination might be useful for this - probably not directly, but it should be possible to leverage the implementation there and take advantage of the fast Grouping work.

axman6 avatar Jun 06 '19 12:06 axman6

@axman6 Thank you for the suggestion! I haven't used discrimination before, but indeed it looks like it might be useful to distributed-dataset on multiple future (joins, shuffles, sorts, ..).

I will look at it thoroughly when I get to implementing those; or you are always welcome to give it a go :)..

utdemir avatar Jun 25 '19 10:06 utdemir