breaker
breaker copied to clipboard
Allow circuit to close even when open with failures < threshold
- We observed an issue recently where the circuit breakers for our application got stuck open and had to be manually closed by editing the fuse database.
- We believe that the issue was that multiple threads moving through the breaker simultaneously triggered a race condition, where the breaker recorded a failure as it was also opening the fuse. This caused the fuse to get into a state where it was open with failures below the failure threshhold.
- At this point (as demonstrated by the test we've added) the circuit will stay closed forever: 'tripped' will always return 'false' because failure count is below threshold, so it will never enter half-open state and allow a successful test request to close the circuit.
- By also sending test requests when the circuit is open but not tripped (which we think should only ever happen in this error state) the circuits will be able to close again once the system that they guard against returns to normal, even if request volume during an outage is high enough to put them into this state.
Signed-off-by: Natalie Bennett [email protected] Signed-off-by: Tom Viehman [email protected]