streamjoin
streamjoin copied to clipboard
Java 8 Stream joins
streamjoin
... for SQL-like Java 8 Stream joins, inspired by C# Enumerable.Join().
It correlates the elements of two streams and provides transformation of matching objects by passing a BiFunction. The correlation between two objects is established by values of key functions.
Joins are applied using a fluent API:
Stream<BestFriends> bestFriends = Join.
join(listOfPersons.stream())
.withKey(Person::getName)
.on(listOfDogs.stream())
.withKey(Dog::getOwnerName)
.combine((person, dog) -> new BestFriends(person, dog))
.asStream();
This combines Person objects with Dog objects by matching equality of a name property and creates a result object for each match.
Not matching Objects and key functions returning null
Join.join(...) defines an inner join, meaning that objects which do not correlate at all are not handled by the combiner and thus will not appear in the result.
Key functions which return null for one or many objects are tolerated, but will treat the object as not matchable.
Join Types
- inner join as shown with
Join.join(...) - left outer joins with
Join.leftOuter(...). Unmatching objects of the left side (i.e. the first stream given) are respected. By default,nullwill be passed to the combining function. An additional handler for unmatching left side objects can be defined with
.combine((left, right) -> something(left, right))
.withLeftUnmatched(left -> someOther(left))
...
One to Many, Many to One, Many to Many
For all join types, multiple matches are respected by calling the combiner for each match. Instead of .combine(combiner), a grouped matcher may be defined, that takes a left object and a stream of matching right objects as parameter:
...
.group((left, streamOfMatchingRight) -> something(left, streamOfMatchingRight))
...
Matching by other constraints
By default, a match is established by equality of key values. Matching by other constraints is provided:
Stream<ShowAttendance> attendances = Join.
join(listOfPersons.stream())
.withKey(Person::getAge)
.on(listOfShows.stream())
.withKey(Show::getMinAge)
.matching((personAge, minAge) -> personAge >= minAge)
.combine((person, show) -> new ShowAttendance(show, person))
.asStream();
Parallel processing and performance
streamjoin supports parallel processing by just passing parallel streams (see Collection.parallelStream() and Stream.parallel()). In order to guarantee correctness, the key functions and combiner/grouper functions should be non-interfering and stateless.
The left side stream is handled lazily and is not 'consumed', i.e. no terminal operation is performed on it.
The right side input stream is collected when finalizing the join with .asStream(). References on resulting data of that stream are held in memory until the resulting joined stream is 'consumed'.
Hence, if huge streams are joined and memory efficiency matters, using the 'shorter' input stream as right side should be considered.
Get it
streamjoin is available via jcenter:
<dependency>
<groupId>de.infonautika.streamjoin</groupId>
<artifactId>streamjoin</artifactId>
<version>1.0.0</version>
<type>pom</type>
</dependency>
or
compile 'de.infonautika.streamjoin:streamjoin:1.0.0'