Limit retries of side effects
We need a mechanism to override the invoker retry policy on an invocation basis from within the SDK. This is required to prevent infinite loops of side effect retries.
Another solution for this could be to provide in the side effect closure the retry count.
I've put some thought into this, and I have a more or less concrete proposal:
- On
StartMessagewe send aretry_attempt. Thisretry_attemptcount is kept track on the invoker side in memory (meaning it's eventually consistent), and it's reset on each new entry, meaning it will be>= 1only if the invoker retries invoking more than once with the same journal - On
ErrorMessagewe add a new optional field to specify the interval before retrying. - With those two fields, now the SDK can allow users to set a retry policy (and even let the user create a custom one) on side effects. In case of a side effect failure, even we write the
EndMessagewith the interval before the next retry, or we record and throw a terminal exception in case the retry attempt is exhausted.
This solution requires to implement those retry policies in every SDK, but this should be few code lines, and it allows users to configure custom ones, or perhaps even hook existing libraries (e.g. resilience4j). Plus it doesn't force the definition of retry policies on the protocol.
The caveat of this solution is that the effective retry count might be higher than the one the user provides. This can happen in a number of situations, e.g. leader election in a distributed setup, restate crashes. However, this should be fine as side effects are already at-least once, so many use cases will be fine with it. If the user wants stricter guarantees, they can build themselves a solution by inserting every run attempt in the journal (which in fact it still won't provide 100% the guarantee that the retry count will be exactly the one they expect to be).
I like the proposal and the simplicity of the building blocks on the server side.
Runtime is now implemented
Closing this now and opened the followups on the specific SDKs