Polly icon indicating copy to clipboard operation
Polly copied to clipboard

Calculated break duration for Circuit breaker and Advance Circuit breaker

Open MattMinke opened this issue 5 years ago • 8 comments

CircuitBreaker and AdvanceCircuitBreaker currently only allow for a fixed break duration. I would like to be able to calculated the break duration based on the number of consecutive "unhealthy" requests made. This would allow the duration to be increased as the downstream system continues to be unavailable. With this I would be able to increase the duration between attempts to interact with the downstream system over time hopefully allowing it to recover. With this I would also be able to reduce log entries for failed requests to the downstream service.

The use case I see for this is a system that is going to be down for awhile do to some catastrophic issue. Like the Retry policy this allows us to wait longer each time before attempting to interact with the system, and using Math.Min allows us to set an upper limit on the wait duration.



Example Usage

Policy.Handle<Exception>().CircuitBreaker(
                exceptionsAllowedBeforeBreaking: 5,
                durationOfBreak: unhealthyAttempts => TimeSpan.FromSeconds(Math.Min(20 + Math.Pow(2, unhealthyAttempts), /* Max */400)));

MattMinke avatar Jun 12 '19 14:06 MattMinke

( Courtesy reply: commitments mean I may not get to respond to this until next week. Thanks. )

reisenberger avatar Jun 13 '19 19:06 reisenberger

(EDIT: Deleted a June response of mine with a sub-optimal suggestion for this feature.)


We'd be happy to consider a PR for this feature.

A metric suitable for both original and advanced circuit-breaker could be that the calculated break duration takes as an input parameter numberOfConsecutiveHalfOpenTestFailures

0 = the circuit has just transitioned from Closed->Open (ie no half-open test failures yet) 1 = one attempt in half-open state was made and failed, the circuit just transitioned from HalfOpen->Open again 2 = two attempts in half-open state have been made and both failed, the circuit just transitioned from HalfOpen->Open for the second time (etc)

reisenberger avatar Dec 11 '19 19:12 reisenberger

Note that I'm interested in picking this task up and submitting a PR.

One concern I have is that we'd probably want to modify all 5 "constructors" for both Sync and Sync version of the Circuit Breaker (and whatever exists for the Adv. version.

That leaves us with 3 options:

  • Break Backwards Compatibility.
  • Doubling the number of "constructor" methods available.
  • Do something awful with declaring a CBOpenDurationDefinition type, with implicit casts from TimeSpan and Func<TimeSpan> and Func<int, TimeSpan> (for your failureCount idea above) so that existing invocations work.

None of those is obviously ideal.

Preferences, @reisenberger ?

Opinions on this?

Brondahl avatar Dec 12 '19 09:12 Brondahl

Hi @Brondahl ! Please follow the existing factory overloads pattern, creating new overloads where necessary (no breaking changes). We have a separate, longer-term proposal for simplfying the number of policy-configuration overloads. Thanks!

reisenberger avatar Dec 15 '19 17:12 reisenberger

Design choice, please! There's a complication with implementing this, because there are 2 diffferent 'kinds' or 'places' of CB duration.

There's the obvious version, where requests are resolving very quickly, and the duration reset is triggered by ConsecutiveCountCircuitController.OnActionFailure, so that Controller can keep track of consecutive failures, and pass that count into Break_NeedsLock.

But there's also the scenario in which the request takes a really long time to resolve, or the nature of the failure is inherently a hang/timeout. In that second case, the duration reset is performed in CircuitStateController.PermitHalfOpenCircuitTest method, but the Interlock.CompareExchange call.

This is tricky, becuase the program context of the PermitHalfOpenCircuitTest call doesn't have a natural, neat, way to share common state with the OnActionFailure call.

I see 3 options:

  • I can force it to share state. But it will feel a little ugly.
  • I can have the timeout version of the Timer Reset just hold the duration constant (i.e. the arg of the DurationFunction is strictly "consecutive Active Failures; passive Timeouts don't count".
  • I can change the functionality, so that the duration reset in PermitHalfOpenCircuitTest is MaxValue; i.e. stay Open until that call does return.

My instinct is to go with option 2). But happy to take your steer, on your preferred option.

Brondahl avatar Dec 21 '19 08:12 Brondahl

The other point that has come up is that the "throw if TimeSpan is non-positive" constraint is more natural to evaluate at Inner-Action-Runtime, rather than Policy-Setup-Time.

Note that a negative Span doesn't actually break anything, it's just not going to achieve anything useful. i.e. the only reason we have it (AFAICT), is so that we can tell the user that they've probably accidentally mis-configured the tool.

Given the many undetectable ways that the user can screw up a Policy config, I'm inclined to simply delete the check, the associated test, and the relevant lines in the Docs.

Is that acceptable? If not, then we keep the setup-time checks on constant durations and either: a) add a runtime check that throws the same exception, but later, for the dynamic durations. b) don't have protection against the dynamic durations being non-positive.

Brondahl avatar Dec 21 '19 08:12 Brondahl

@reisenberger ☝️

Brondahl avatar Dec 21 '19 08:12 Brondahl

See PR ☝️ :) Not quite "finished", there's a bunch of Tedious "and now do it all again 7 times" stuff to do, and some better tests to write, but the meat of it is in there.

What do you guys think? (@reisenberger )

Brondahl avatar Dec 21 '19 18:12 Brondahl

@martintmk Is this use case supported in Polly v8?

martincostello avatar Jul 20 '23 07:07 martincostello

Currently not, but it's easy enough to add post V8 release.

https://github.com/App-vNext/Polly/blob/dd14ca630b7614955580e7b26e0adc56f653c694/src/Polly.Core/CircuitBreaker/CircuitBreakerStrategyOptions.cs#L34

public abstract class CircuitBreakerStrategyOptions<TResult>
{
   public Func<BreakDurationGeneratorArguments, ValueTask<TimeSpan>>? BreakDurationGenerator { get; set; }
}

It will be optional generator that can override the fixed BreakDuration.

martintmk avatar Jul 20 '23 07:07 martintmk

This issue is stale because it has been open for 60 days with no activity. It will be automatically closed in 14 days if no further updates are made.

github-actions[bot] avatar Sep 19 '23 01:09 github-actions[bot]

https://github.com/App-vNext/Polly/issues/653#issuecomment-1643422975 I will work on a PR for this issue.

atawLee avatar Oct 12 '23 13:10 atawLee

Resolved by #1776.

martincostello avatar Nov 08 '23 09:11 martincostello