distributed-dataset
distributed-dataset copied to clipboard
Joins
Implement a join function which joins two datasets based on a key/key function. Different join types (left/right/full outer, cartesian product) should be supported.
The generalised joins in discrimination might be useful for this - probably not directly, but it should be possible to leverage the implementation there and take advantage of the fast Grouping work.
@axman6 Thank you for the suggestion! I haven't used discrimination before, but indeed it looks like it might be useful to distributed-dataset on multiple future (joins, shuffles, sorts, ..).
I will look at it thoroughly when I get to implementing those; or you are always welcome to give it a go :)..