semian icon indicating copy to clipboard operation
semian copied to clipboard

Feature Request: Percentage Based Error Thresholds

Open michaelkipper opened this issue 5 years ago • 1 comments

What

Currently, we express error thresholds as the number of failures (error_threshold) in a certain time period (error_timeout). After that threshold is reached, we open the circuit, and only close it again after a certain number of successful requests (success_threshold) are reached.

This requires intimate knowledge of your request patterns. A more flexible model is to use an error percentage threshold to determine when to open the circuit. Instead of saying 3 failures in 5 seconds, one might say over 10% of requests failed.

How

Either add a new parameter, error_percent_threshold or allow error_threshold to be expressed as a percentage (e.g. "10%").

Maintain either a large sliding window of successes and errors to compute percentages, or perhaps a set of counters to reduce the overall size of the windows.

michaelkipper avatar Jul 02 '19 15:07 michaelkipper

This requires intimate knowledge of your request patterns

On top of that, it also assumes a happy path. I like a percentage threshold because it could adapt to a shift in traffic that wasn't anticipated (e.g., flash sale). This would work well with our current setup where bulkheads trip circuits, because bulkhead timeouts are more likely during high RPS, which in turn makes it more likely that we'd open circuits in high traffic situations (which may not be desirable).

thegedge avatar Jul 04 '19 19:07 thegedge