streamz
streamz copied to clipboard
How to keep some sort of ordering between joined streams?
Say I have two source streams on kafka, one updating at 1 message per second, and one updating at 5 messages per second. I am using a combine_latest to join the streams. What would be the most appropriate way to not allow the streams to get more than ~10s out of sync? I effectively want to push futures back up one stream, blocking until more data arrives on the second.
Or in another scenario, suppose one of the sources dies temporarily. I really would like to be able to hault any more emits until the source comes back to life.
Is there some code which says that the source died? If so then you could maybe push all the data into a buffer while waiting for the source to come back.
Similarly with the first scenario, you could have a conditional which depending on the time out of sync either pushes data through the normal route, or to a buffer until new data comes in.
Maybe?
Yeah I like your second idea @CJ-Wright. I will whip something up - I wonder whether others might find it useful enough for a PR.
I also just wanted to double check there wasn't some way to do something like this already.
I've used a combination of joining nodes, filter, and pluck to essentially gate certain paths. Maybe two gates would work for pushing data directly into the processing or into a buffer.