semian
semian copied to clipboard
calculate http error_timeout based upon capacity option
This PR proposes an intuitive way to configure :error_threshold
for Semian HTTP configurations. The user configures a :capacity
option as a percentage and the :error_threshold
is calculated based upon a requests :read_timeout
.
Reasoning
The following diagrams assumes the circuit starts open and the requested endpoint is not recovering. This means the worker will alternate between the open and half open state.
t=0 t=1 t=2 t=3 t=4 t=5
|----------|-----------|----------|-----------|----------|
open half open half open half
free busy free busy free busy
Whenever the circuit is in an open state, the worker is able to do work for other resources. But when the worker is in a half open state, the worker cannot do other work because it's stuck hanging until the request times out.
We're calling this ratio of free to busy state the worker's :capacity
.
The High School Math
For Semian HTTP requests we can calculate capacity based on this equation:
capacity = error_timeout / (error_timeout + request_timeout)
In words, capacity of a given worker is the amount of time that is not spent hanging on a single request.
Examples
- A
:capacity
of0.5
would set the:error_timeout
state to whatever the request timeout is. - As
:capacity
approaches infinity,:error_timeout
also approaches infinity. - With a
:capacity
of0.75
and a60
second request timeout, the:error_timeout
would be180
seconds.
Isn't this capacity stuff meant to be handled by bulkheads?
- This PR addresses the capacity of a lone worker and doesn't require shared state between workers.
- Bulkheads require a semaphore per resource which is expensive when dealing with a large number of resources. (For example a large number of HTTP requests.)
Concerns
- This idea should be verified by a number of trained experts in high school math.
- Just because the idea makes sense does not mean it's worth adding to Semian.
- Do we care that the default
:read_timeout
being60
seconds will lead to values of:error_timeout
greater than a minute when:capacity > 0.5
?