TANK icon indicating copy to clipboard operation
TANK copied to clipboard

Merge-join multiple streams

Open markpapadakis opened this issue 9 years ago • 1 comments

It should be easy to implement a TankClient utility method that facilitates joining multiple streams based on the message timestamp and a simple merge strategy.
For 1+ streams, we can just consume and buffer from streams and 'pop' the earliest message from all of those buffered message streams, and refill the stream when needed.
This should make it easy to join many different streams as if it was a single stream, all the while retaining time-based ordering (this doesn't guarantee strict ordering but it will almost always be the case anyway). This can work for multiple partitions of the same topic, or multiple partitions across multiple topics.

markpapadakis avatar Oct 26 '16 07:10 markpapadakis

There is a class like that, in heavy use here, but it hasn't been merged into the client, because we 'll eventually provide a Kafka-streams like abstraction that can also be used for joins.

markpapadakis avatar Oct 09 '17 12:10 markpapadakis