semian icon indicating copy to clipboard operation
semian copied to clipboard

`error_threshold_timeout_enabled` can open too aggressively

Open AbdulRahmanAlHamali opened this issue 7 months ago • 5 comments

This option was introduced in https://github.com/Shopify/semian/pull/418, which is meant to open on consecutive errors, regardless of time window. However, its implementation does not really take "consecutiveness" into account, causing it to open incorrectly.

For example, with an error_threshold of 2, and error_threshold_timeout_enabled set to false, the following might cause the circuit to open:

error -> success -> success -> success -> success -> ....10000 more successes -> error

This is not the desired behaviour

AbdulRahmanAlHamali avatar Jun 03 '25 20:06 AbdulRahmanAlHamali

Is it correct to assume the success criteria here, for the example above, error_threshold_timeout_enabled = false and error_threshold = 2 should only open if there are 2 consecutive (nothing between) errors, regardless of the window between the 2 events?

celso-shopify avatar Jun 03 '25 20:06 celso-shopify

Is it correct to assume the success criteria here, for the example above, error_threshold_timeout_enabled = false and error_threshold = 2 should only open if there are 2 consecutive (nothing between) errors, regardless of the window between the 2 events?

Yes, I believe this is the documented behaviour, however at first glance this "consecutiveness" property does not seem to be implemented in the current Semian.

abishanan-shopify avatar Jun 19 '25 14:06 abishanan-shopify

Seems like our current test case verifies that this documented behaviour is not implemented in the current Semian:

https://github.com/Shopify/semian/pull/652/commits/814dac799a802dc8acc0245c0ce5961f5ad47839

abishanan-shopify avatar Jun 25 '25 15:06 abishanan-shopify

For context: line 735 in the commit @abishanan-shopify shared fails.

This means that you could have a single error, then 100s of successes (for any period of time), and then a single error, and your circuit will end up opening up

AbdulRahmanAlHamali avatar Jul 08 '25 14:07 AbdulRahmanAlHamali

We are a bit hesitant about fixing this, because we don't know who's using it out there and in what form. However, we might introduce a different option, and flag this one in the documentation as a problematic one.

AbdulRahmanAlHamali avatar Aug 22 '25 14:08 AbdulRahmanAlHamali