`error_threshold_timeout_enabled` can open too aggressively
This option was introduced in https://github.com/Shopify/semian/pull/418, which is meant to open on consecutive errors, regardless of time window. However, its implementation does not really take "consecutiveness" into account, causing it to open incorrectly.
For example, with an error_threshold of 2, and error_threshold_timeout_enabled set to false, the following might cause the circuit to open:
error -> success -> success -> success -> success -> ....10000 more successes -> error
This is not the desired behaviour
Is it correct to assume the success criteria here, for the example above, error_threshold_timeout_enabled = false and error_threshold = 2 should only open if there are 2 consecutive (nothing between) errors, regardless of the window between the 2 events?
Is it correct to assume the success criteria here, for the example above,
error_threshold_timeout_enabled = falseanderror_threshold = 2should only open if there are 2 consecutive (nothing between) errors, regardless of the window between the 2 events?
Yes, I believe this is the documented behaviour, however at first glance this "consecutiveness" property does not seem to be implemented in the current Semian.
Seems like our current test case verifies that this documented behaviour is not implemented in the current Semian:
https://github.com/Shopify/semian/pull/652/commits/814dac799a802dc8acc0245c0ce5961f5ad47839
For context: line 735 in the commit @abishanan-shopify shared fails.
This means that you could have a single error, then 100s of successes (for any period of time), and then a single error, and your circuit will end up opening up
We are a bit hesitant about fixing this, because we don't know who's using it out there and in what form. However, we might introduce a different option, and flag this one in the documentation as a problematic one.