anomaly-detection icon indicating copy to clipboard operation
anomaly-detection copied to clipboard

Max retry attempt should be different among detectors

Open kaituo opened this issue 4 years ago • 0 comments

Is your feature request related to a problem? Please describe. Currently, when we encounter EndRunException, we would retry 3 times before terminating. This is not ideal. For example, if detector interval is 1 minute and memory circuit breaker is open, we may not be able to recover within 3 minutes.

Describe the solution you'd like We should set max retry attempts based on detector interval. Also, the number of allowed retries should be settable per exception object, i.e. the endrun exception contains a field for the number of retries, which can be specific to detector (interval), exception type (limit exceeded, other endrun), or root cause. Stop now being true would just be a special case that number of retries being 0.

kaituo avatar Jun 22 '20 20:06 kaituo