cloudflow
cloudflow copied to clipboard
Producer delivery semantic for Flink streamlets
Is your feature request related to a problem? Please describe.
I noticed that when errors occurs in the Flink streamlet then some messages duplicates in the outlet Kafka topic. To my knowledge Flink supports exactly once message processing semantic: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
I found that the delivery semantic parameter of Kafka producer in the Flink streamlet is specified as AT_LEAST_ONCE
:
https://github.com/lightbend/cloudflow/blob/master/core/cloudflow-flink/src/main/scala/cloudflow/flink/FlinkStreamletContextImpl.scala#L90
Is your feature request related to a specific runtime of cloudflow or applicable for all runtimes? Only in Flink runtime.
Describe the solution you'd like It would be great to implement providing delivery semantic parameter via configuration parameters (in streamlet or application.conf).
This would be really good to have indeed! I think we observed something similar while debugging today.
We kept it AT_LEAST_ONCE
to keep the delivery semantics same as the other runtimes, Akka and Spark. May be we can change it since Flink supports EXACTLY_ONCE
- WDYT @RayRoestenburg ?
@debasishg yes, we can think about changing the default. I don't think that there are any downsides to that? Configuration is definitely what we want to make possible in the future.
@RayRoestenburg I don't think there is any downside to this. Yeah, these will be easy with our upcoming configuration system.