akka-stream-contrib
akka-stream-contrib copied to clipboard
Add a deduplicate stage
Migrated from: https://github.com/akka/akka/issues/19395#issuecomment-221261407
Isn't this a duplicate of #6
I think I might have this already. See discussion here: https://github.com/akka/reactive-kafka/issues/325#issuecomment-314668597 (also with @johanandren). And the initial StackOverflow question that prompted me to write it: https://stackoverflow.com/questions/44961050/at-least-once-delivery-semantics-on-a-graph-with-a-record-multiplier/44961965#44961965
The idea is a stage that allows the specification of a similarity function and a completeness function, as well as a timeout. It waits for all messages that have the same output on the similarity function (within the timeout period) and when completeness returns true, it sends an ordered sequence of all values downstream.
Motivation for such a thing comes from at-least-once guarantee requirements, wherein a set of generated payloads internal to the flow and of a-priori-unknown cardinality all need to complete processing before a commit is made (which is why I initially posted this on reactive kafka).
The discussion on reactive-kafka should be referenced for why (I think) this can't be done with eg. statefulMapConcat. If it can be done with an existing set of stages, I'd be happy to see the solution. FWIW I have this stage running now in roughly 10 places in some fairly complex flows that we will be deploying in the next week and all tests are passing.