Polly icon indicating copy to clipboard operation
Polly copied to clipboard

Dynamic circuit break period

Open Brondahl opened this issue 4 years ago • 34 comments

Initial work to allow for dynamically calculated CircuitBreaker break periods.

The issue or feature being addressed

See Issue: https://github.com/App-vNext/Polly/issues/653 ( @reisenberger )

Details on the issue fix or feature implementation

See Issue: https://github.com/App-vNext/Polly/issues/653 Note that there seem to be 2 contentious design choices outstanding; mentioned on that Issue.

w.r.t. Non-Positive inputs, I've gone with leaving static-duration behaviour unchanged, and having no checks for dynamic durations.

w.r.t. Delay durations when the response is slow to come back, I've gone with re-using the same delay, until one of the responses actually completes.

Several things still need to be done, before this is fully suitable for deployment, but they're all fairly rote changes, so I want to get the meat of the change approved before doing them, since if there are substantial markups the rote changes might need tweaks, which would be ... tedious :)

  • Trivially apply all the same changes to the Factory Methods for other syntaxes (CircuitBreakerAsync, CBTReslt, CBTResultAsync)
  • Figure out what (if anything?) should be done for the Advanced version of the CB. Then apply all the same boring Factory Method changes to all 4 version of that.
  • Update existing tests for all of the above.
  • Create new tests for the specific functionality for the duration to change dynamically (as opposed to the existing tests which all use constant-return functions. This ensures that the existing functionality is unchanged, and that the most basic use-case of the functionality works, but is somewhat lacking in depth.)
  • Add general documentation (the XMLDocs are updated, but I assume there's a high-level doc somewhere that I should update too.

Confirm the following

FIXED: ~Branch state on your repo currently looks weird~

^ See above, the existing tests have been updated, and crudely exercise the new functionality. Additional tests are required, and I'll create them in due course; wanted to get the meat of the change up for someone to look at while I'm doing that.

  • [x] I started this PR by branching from the head of the latest dev vX.Y branch, or I have rebased on the latest dev vX.Y branch, or I have merged the latest changes from the dev vX.Y branch
  • [x] I have targeted the PR to merge into the latest dev vX.Y branch as the base branch
  • [Y^] I have included unit tests for the issue/feature
  • [Y] I have successfully run a local build

Brondahl avatar Dec 21 '19 18:12 Brondahl

CLA assistant check
All CLA requirements met.

dnfclas avatar Dec 21 '19 18:12 dnfclas

Note that you may well find it easier to review this commit-by-commit. The first commit is entirely mechanical copy-pasting; the ones after that are more interesting.

Brondahl avatar Dec 22 '19 08:12 Brondahl

Thanks @Brondahl for the contribution! I am now on family/vacation time; I might get to look at this in the next couple days, but it could also be late December or January. Thanks.

reisenberger avatar Dec 22 '19 10:12 reisenberger

👍 Enjoy your break. Thanks for the expectation-setting :)

Brondahl avatar Dec 22 '19 10:12 Brondahl

EDIT: New status update below, as of 20/1/2020

Cool, so status update: I've applied almost all the review markups, locally.

Tasks still outstanding:

  • (@reisenberger) Sign-off on general review discussions.
  • (@reisenberger) Confirm desired Exception type for `durationFactory() returns < 0" review markup.
  • (@Brondahl) Implement ☝️ Exception throw.
  • (@reisenberger + @Brondahl) Decide on what we're going to do to the Advanced Circuit Breaker.
  • (@Brondahl) All the bits of tedious TODOs from the original PR comment.

Brondahl avatar Jan 01 '20 09:01 Brondahl

(@reisenberger + @Brondahl) Decide on what we're going to do to the Advanced Circuit Breaker.

So I see ... 3? ... options.

  • Do nothing.
    • Assert that the Proportion-based approach of Adv.C.B. is incompatible with dynamic Half-Open durations, and leave it as it is.
    • :(
  • Port identical functionality over.
    • Argue that the Advancedness of the Adv.C.B is solely about the details of when you intially call Break_NeedsLock() in the Controller, whilst in the Closed state, and the behaviour of the HalfOpen state is no different.
    • Simples!
  • Create Advanced dynamic Breaks!?
    • Try to invent something that is analogous to the proportional healthchecks of the Closed state, and apply that to the Half-Open state.
    • Uhh ... sounds a bit rough.

My vote is (probably unsurprisingly) option No.2. Opinions?

Brondahl avatar Jan 01 '20 09:01 Brondahl

@Brondahl Over to you to push all the changes so far, I think I'm done on these:

(@reisenberger) DONE Sign-off on general review discussions. (@reisenberger) DONE Confirm desired Exception type for `durationFactory() returns < 0" review markup.

(Might have small comments on ExtensionUpToMaxValue() once pushed.)


Re:

(@reisenberger + @Brondahl) Decide on what we're going to do to the Advanced Circuit Breaker.

Great overview of options ^^ 👍 . Reflecting on this.

reisenberger avatar Jan 02 '20 20:01 reisenberger

(@reisenberger + @Brondahl) Decide on what we're going to do to the Advanced Circuit Breaker My vote is (probably unsurprisingly) option No.2.

Let us implementation option 2 at this time. Proportion-based health-checks for the half-open state (option 3) is a larger build, and requires wider consideration of what, conceptually, makes sense.

reisenberger avatar Jan 03 '20 08:01 reisenberger

👍 Will update from the small comments above and push. Then start the rote changes. Will probably hold off writing completely new unit tests, and discuss them with you before I start to check that I'm approaching it correctly.

Brondahl avatar Jan 03 '20 09:01 Brondahl

@reisenberger I've committed all the tweaks to the above, and the core change for the Adv. CB. Shall I raise that as a new bnanch and new PR based off of this original branch, or just push it as a change to this branch? Which way to you prefer to view incremental changes on a PR?

Brondahl avatar Jan 04 '20 15:01 Brondahl

just push it as a change to this branch

... is fine. Thanks!

reisenberger avatar Jan 06 '20 18:01 reisenberger

Heya, sorry for the delay. work got a little hectic as a project drew to a close. Back on this now.

Have just pushed the aforementioned review markups and Implementation for Adv. CB.

Am working on the rote changes and similar.

Brondahl avatar Jan 17 '20 09:01 Brondahl

@reisenberger I've completed the rote changes (adding the new options to all the syntaxes, and extending all the tests to run with static duration and also "dynamic" duration (i.e. where the Func<int, TimeSpan> syntax is used, but it always returns the same value.

Do you want to shout when you've reviewed the previous push ☝️ , and then I'll add these on top of it.

Alternatively you might just want to review my approach to the Test file updating. I used a couple of regexes to Find'N'Replace all the changes

Make all the tests Theories, with true and false values: [Fact] => [Theory, InlineData(true), InlineData(false)] (not actually as a regex, just literals.)

Make all the tests parameterised to accept those values: ( *public (void|async Task) \w*)\(\) * => $1(bool timespanIsDynamic) (and then manually fix the Dispose() method)

Call the extensions and pass the parameter: \.(Advanced)?CircuitBreaker(Async)?\((.*)\); => .$1CircuitBreaker$2_WithConditionalTimeSpan($3, timespanIsDynamic); (plus some whitespace fixing for multi-line invocations in AdvancedCBSpecs.cs)

Notably, it turns out you've neatly got exactly one CircuitBreaker call per-Test, and the Regexes found the same number of hits as the Fact -> Theory change, so I'm reasonably confident I caught everything.

Brondahl avatar Jan 20 '20 10:01 Brondahl

EDIT: New status update below, as of 31/1/2020

Remaining actions:

  • @reisenberger to re-review original markups (currently pushed)
  • @reisenberger to shout if/when you want the rote changes pushed for review.
  • @Brondahl to prepare wiki-doc updates.
  • @Brondahl to figure out what specific testing is needed, and write up.
  • @reisenberger to review test-plan.
  • @Brondahl to implement testing, and progress from there.

Brondahl avatar Jan 20 '20 10:01 Brondahl

@Brondahl Thanks for this! I will review and get back to you (likely twds the end of the week or weekend)

reisenberger avatar Jan 21 '20 08:01 reisenberger

Heya,

Had a proper look at the tests, and I don't think the tests for dynamic duration are too complex. As a precursor, I'm going to factor out the most common, repetitive lines of the tests, after which the dynamic duration tests will look a bit like this:

        [Fact]
        public void Should_open_circuit_for_appropriate_dynamic_duration_when_call_raises_exception_repeatedly_in_sequence()
        {
            var startTime = 1.January(2000);
            var currentTime = startTime;
            SystemClock.UtcNow = () => currentTime;

            _breaker = Policy
                .Handle<DivideByZeroException>()
                .CircuitBreaker(2, (numberOfErrors) => TimeSpan.FromMinutes(numberOfErrors + 1));

            ExpectFailureCall_ToResultInPassThroughError____EndingUpInState(CircuitState.Closed);
            ExpectFailureCall_ToResultInPassThroughError____EndingUpInState(CircuitState.Open);

            ExpectFailureCall_ToResultInCircuitBreakerError_EndingUpInState(CircuitState.Open);

            // when initial duration has not passed, circuit still open
            currentTime = startTime.AddMinutes(1).AddMilliseconds(-1);
            ExpectState(CircuitState.Open);

            // when initial duration has passed, circuit now half-open
            currentTime = startTime.AddMinutes(1);
            ExpectState(CircuitState.HalfOpen);


            // Further error opens it back up, starting a new duration
            ExpectFailureCall_ToResultInPassThroughError____EndingUpInState(CircuitState.Open);
            var reopenTime = currentTime;


            // after initial duration has passed, circuit is *still* open, because the new duration is longer.
            currentTime = reopenTime.AddMinutes(2).AddMilliseconds(-1);
            ExpectState(CircuitState.Open);

            // once newer duration has passed, circuit is now half-open again.
            currentTime = reopenTime.AddMinutes(2);
            ExpectState(CircuitState.HalfOpen);
        }

Any concerns? Any particular aspects of the functionality that you want me to ensure are being tested, which you think I might forget?

Brondahl avatar Jan 28 '20 18:01 Brondahl

Thanks @Brondahl ! I'll aim to join thinking on specific test plans for dynamic-circuit-break-period either Wed or Thurs.

reisenberger avatar Jan 28 '20 22:01 reisenberger

Hi @Brondahl

Proving existing behaviour unaffected

It looks like most of the existing tests on circuit-break duration are in a section marked Circuit-breaker open->half-open->open/closed tests. There is also a separate test in the callback-delegates section about the break-duration being passed to the callback delegate: Should_call_onbreak_with_the_correct_timespan.

For these existing tests, we would want variants checking that existing circuit-break-duration behaviour is not affected. IE If the tests do not already cover it, we should extend existing tests/add new tests confirming that when using policy configuration overloads with constant durationOfBreak, the duration of break indeed remains constant when the circuit repeatedly transitions back to open from half-open.

New tests

For any of the existing tests ^ which focused on break-duration, I guess we would want counterparts for the dynamic case.

reisenberger avatar Jan 29 '20 08:01 reisenberger

Tests on orthogonal matters

If I've understood the suggestion correctly:

Make all the tests Theories, with true and false values: [Fact] => [Theory, InlineData(true), InlineData(false)] (not actually as a regex, just literals.)

I don't think we need to do this. Where a circuit-breaker tests is testing something entirely orthogonal to the changes you have made (that it breaks on the right exceptions; whatever), there's no need to duplicate these for both static and dynamic break durations. If we'd taken that approach/took that approach going forward for each new feature added to circuit-breaker, we'd soon end up with a 2 ^ N explosion of tests where N is the number of new features.

reisenberger avatar Jan 29 '20 08:01 reisenberger

I don't think we need to do this. Where a circuit-breaker tests is testing something entirely orthogonal to the changes you have made (that it breaks on the right exceptions; whatever), there's no need to duplicate these for both static and dynamic break durations.

I can see that argument.

I took my cue from the fact that each different variant on the *CircuitBreaker*Syntax, does have a largely duplicated collection of tests (i.e. CircuitBreakerSpecs, CircuitBreakTResultSpecs, CircuitBreakerAsyncSpecs.) Most of the tests in those files aren't testing things that are unique to the differences from the base implementation; they're just testing "this feature also works in the context of there being a return value in the policy", etc.

Similarly, all the tests are re-run for all the .NET versions.

(Plus, I assumed you were in favour since you didn't comment on it in the initial review ;) )

I agree that the majority of that duplication is highly unlikely to excercise any new edge-cases.

For the avoidance of doubt, given that that change has been coded up for all Test Files and all tests pass, do you want me to remove those changes? Happy to do so, if you'd still like me to, given the above :)

Brondahl avatar Jan 29 '20 09:01 Brondahl

It looks like most of the existing tests on circuit-break duration are in a section marked Circuit-breaker open->half-open->open/closed tests. [snip] For these existing tests, we would want variants checking that existing circuit-break-duration behaviour is not affected. IE If the tests do not already cover it, we should extend existing tests/add new tests confirming that when using policy configuration overloads with constant durationOfBreak, the duration of break indeed remains constant when the circuit repeatedly transitions back to open from half-open.

For any of the existing tests ^ which focused on break-duration, I guess we would want counterparts for the dynamic case.

Yup, agree with all that.

Brondahl avatar Jan 29 '20 10:01 Brondahl

There is also a separate test in the callback-delegates section about the break-duration being passed to the callback delegate: Should_call_onbreak_with_the_correct_timespan.

Ah hah! Yes, that's the sort of thing I think I would have missed :) Good catch; thanks!

Brondahl avatar Jan 29 '20 10:01 Brondahl

LMK about removing the duplication of existing tests, and then I'll implement the extensions and new tests.

Brondahl avatar Jan 29 '20 10:01 Brondahl

Thanks @Brondahl ! Yes, I think we should remove the dual-condition running of existing tests modelled with [Theory, InlineData(true), InlineData(false)]. But thanks for suggesting it; I understand the motivation :+1:

reisenberger avatar Jan 29 '20 18:01 reisenberger

Thanks @Brondahl for everything on this :+1: . Re:

@reisenberger to re-review original markups (currently pushed)

DONE - all good.

@reisenberger to shout if/when you want the rote changes pushed for review.

READY

@reisenberger to review test-plan.

DONE - as above

reisenberger avatar Jan 29 '20 22:01 reisenberger

Rote changes to other syntaxes pushed. As previously the first commit (1974636a) is duplicating the existing methods verbatim, so that the second commit (fc80d1368) has the interesting diffs. So you might want to exclude that commit from the diff you're looking at.

Brondahl avatar Jan 31 '20 11:01 Brondahl

Updates to tests are not currently included since I need to revert the test duplication, as above. Once that's complete I'll push those changes, but since they're independant files, I don't think that should block review of the Syntax files.

Brondahl avatar Jan 31 '20 11:01 Brondahl

Actions:

  • @reisenberger to review rote syntax changes.
  • @Brondahl to revert dynamic/static test duplication.
  • @Brondahl to implement test refactoring across remaining test files.
  • @Brondahl to implement new tests as per test plan above.
  • @Brondahl to push all above test changes
  • @reisenberger to review test changes.
  • @Brondahl to prepare wiki-doc updates.

Brondahl avatar Jan 31 '20 11:01 Brondahl

@reisenberger to review rote syntax changes.

DONE :+1: All good save minor parameter-naming things prev ^ mentioned. Thanks @Brondahl for all the great attention to detail on this PR!

reisenberger avatar Feb 03 '20 18:02 reisenberger

Heya, Sorry about the radio silence; things got busy at work and in life in general :)

I've been gradually applying the general refactoring to the tests, to commonise the repetitive bits. That's pretty slow going though. :(

So I'm going to put that down, and just apply the testing for ☝️ without that commonisation. I'll raise a separate PR for the test commonisation (Like the param renaming), since I think that's worth cleaning up, still.

Hopefully I'll be able to get the Tests up by the end of the weekend.

Brondahl avatar Feb 29 '20 13:02 Brondahl