cloudflow icon indicating copy to clipboard operation
cloudflow copied to clipboard

Producer delivery semantic for Flink streamlets

Open unit7-0 opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe. I noticed that when errors occurs in the Flink streamlet then some messages duplicates in the outlet Kafka topic. To my knowledge Flink supports exactly once message processing semantic: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html I found that the delivery semantic parameter of Kafka producer in the Flink streamlet is specified as AT_LEAST_ONCE: https://github.com/lightbend/cloudflow/blob/master/core/cloudflow-flink/src/main/scala/cloudflow/flink/FlinkStreamletContextImpl.scala#L90

Is your feature request related to a specific runtime of cloudflow or applicable for all runtimes? Only in Flink runtime.

Describe the solution you'd like It would be great to implement providing delivery semantic parameter via configuration parameters (in streamlet or application.conf).

unit7-0 avatar Apr 02 '20 14:04 unit7-0

This would be really good to have indeed! I think we observed something similar while debugging today.

wjglerum avatar Apr 10 '20 13:04 wjglerum

We kept it AT_LEAST_ONCE to keep the delivery semantics same as the other runtimes, Akka and Spark. May be we can change it since Flink supports EXACTLY_ONCE - WDYT @RayRoestenburg ?

debasishg avatar Apr 10 '20 14:04 debasishg

@debasishg yes, we can think about changing the default. I don't think that there are any downsides to that? Configuration is definitely what we want to make possible in the future.

RayRoestenburg avatar Apr 10 '20 15:04 RayRoestenburg

@RayRoestenburg I don't think there is any downside to this. Yeah, these will be easy with our upcoming configuration system.

debasishg avatar Apr 10 '20 16:04 debasishg