kube
kube copied to clipboard
Add Backoff-based error policy
This adds an error_policy based on the backoff crate, which gives us the ability to use stateful error policies. It also comes with some useful default implementations, such as exponential backoff, retry immediately, and constant retry.
This also has some side effects with breaking changes. Notably:
-
error_policyis now told about successful reconciliations. This is required so that stateful error policies can reset upon successes. -
error_policyis now told the reference to the reconciled object. -
controller::Erroris now keyed by the object type. This isn't strictly required, but allows for some internal cleanup. -
Controller::runtakes aBackofffactory rather than a rawerror_policy. This provides a much cleaner interface for custom policies, and thebackoffcrate provides both constant and exponential policies. If you need the extra control that the old interface provided, use the "hard-mode"controller::applierAPI.
Fixes #297. Arguably fixes #34.
One downside of this PR is that it worsens the ergonomics of error policies that differ depending on the error case...
Sorry for the slow responses here. This is ultimately great. Have approved it so merge if you are happy, otherwise there's a bit of bikeshedding in the comments (closed all my exploratory ones).
Having thought about this some more. You seem unsure about this as well (since it is still here), and there are some downsides to this that is making me more sceptical:
- the user ergonomics, as you say, feel a bit awkward with a lambda in the signature
- it is somewhat complex internally
- backoff constitutes an extra dependency (with runtime deps) that no one but me actually asked for
- maybe this is more heavyweight than required (given that most controller-runtime based controllers always just have a flat ~300s)
- not clear what a default backoff retry should be after a first failure, 1s, 10s, 100s?
I think our lack of defining a good default start time here is the most problematic. All those values (1s/10s/100s), feel sensible depending on what needs to be created of the back of a CRD. So users probably want to set a minimum time to avoid a bunch of initial errors in logs from early retries.
However, doing so in ExponentialBackoff looks pretty awkward to me. Manually creating the instance, selecting a generic clock, possibly using its Default impl.
I'm starting to think this is probably not worth merging at the moment. Not sure what you think? Maybe this is something we put on the back-burner for now until we have more info?