numaflow icon indicating copy to clipboard operation
numaflow copied to clipboard

revisit BufferWriter.Write retries strategy for UDSink

Open vigith opened this issue 2 years ago • 2 comments

Today BufferWriter.Write retries on error, in udSink if the user mistakenly sets response.Failure for a message then we will end up in infinite retry.

As a data platform our strategy is not to retry on user errors but only on platform errors, the user code should handle the failures by retrying and return only if it is successful. We have done this for map, we have to implement the same for sink.

vigith avatar Jan 18 '23 19:01 vigith

response.Failure in udsink means it fails to deal with the message, which will lead to an explicit retry on it. This gives the udsink a chance to implement a strong contract (for example, exactly-once) between the real data sink store and the udsink.

Of course, the retry could be implemented in the udsink, but comparing with the retry mechanism provided by the platform, it needs extra code for the developer to handle retry, and it has issues such as gRPC timeout due to long time retrying.

I would prefer to keep the mechanism for udsink, and make it clear in the document that return a Failure will lead to retries.

whynowy avatar Jan 18 '23 21:01 whynowy

It isn't easy to do exactly once with Sinks because we cannot control how sink manages duplicates. For the same reason, most of the platforms (like Flink) don't guarantee exactly-once on sink either.

vigith avatar Jan 18 '23 21:01 vigith