kube icon indicating copy to clipboard operation
kube copied to clipboard

Add Backoff-based error policy

Open nightkr opened this issue 5 years ago • 3 comments

This adds an error_policy based on the backoff crate, which gives us the ability to use stateful error policies. It also comes with some useful default implementations, such as exponential backoff, retry immediately, and constant retry.

This also has some side effects with breaking changes. Notably:

  1. error_policy is now told about successful reconciliations. This is required so that stateful error policies can reset upon successes.

  2. error_policy is now told the reference to the reconciled object.

  3. controller::Error is now keyed by the object type. This isn't strictly required, but allows for some internal cleanup.

  4. Controller::run takes a Backoff factory rather than a raw error_policy. This provides a much cleaner interface for custom policies, and the backoff crate provides both constant and exponential policies. If you need the extra control that the old interface provided, use the "hard-mode" controller::applier API.

Fixes #297. Arguably fixes #34.

nightkr avatar Aug 30 '20 01:08 nightkr

One downside of this PR is that it worsens the ergonomics of error policies that differ depending on the error case...

nightkr avatar Aug 30 '20 02:08 nightkr

Sorry for the slow responses here. This is ultimately great. Have approved it so merge if you are happy, otherwise there's a bit of bikeshedding in the comments (closed all my exploratory ones).

clux avatar Sep 28 '20 17:09 clux

Having thought about this some more. You seem unsure about this as well (since it is still here), and there are some downsides to this that is making me more sceptical:

  • the user ergonomics, as you say, feel a bit awkward with a lambda in the signature
  • it is somewhat complex internally
  • backoff constitutes an extra dependency (with runtime deps) that no one but me actually asked for
  • maybe this is more heavyweight than required (given that most controller-runtime based controllers always just have a flat ~300s)
  • not clear what a default backoff retry should be after a first failure, 1s, 10s, 100s?

I think our lack of defining a good default start time here is the most problematic. All those values (1s/10s/100s), feel sensible depending on what needs to be created of the back of a CRD. So users probably want to set a minimum time to avoid a bunch of initial errors in logs from early retries.

However, doing so in ExponentialBackoff looks pretty awkward to me. Manually creating the instance, selecting a generic clock, possibly using its Default impl.

I'm starting to think this is probably not worth merging at the moment. Not sure what you think? Maybe this is something we put on the back-burner for now until we have more info?

clux avatar Nov 11 '20 12:11 clux