cruise-control
cruise-control copied to clipboard
Self-healing can forever "miss" broker failures that occur while it is disabled
Background
cruise-control
provides self-healing for a variety of anomalies.
When an anomaly is detected, and the corresponding self-healing boolean
is enabled, cruise-control
can generate a proposal on its own to correct the anomaly, and then execute that proposal.
When these anomalies happen while the corresponding self-healing boolean
is disabled, cruise-control
(as-expected) does not generate proposals to correct the anomaly.
However, when an anomaly (e.g. a broker failure) occurs while self-healing is disabled, that same anomaly event will not cause cruise-control
to generate a proposal when self-healing is later enabled.
Problem
Anomalies that occur while self-healing are disabled can essentially be "lost" to cruise-control
, and some other actor has to tell cruise-control affirmatively to act. Re-enabling self-healing is insufficient to cause cruise-control
to heal the anomaly.
Proposal
When cruise-control
's anomaly-detection is re-enabled, any outstanding (i.e. not stale) anomalies should be acted on.
The issue description is a little off. This is only relevant for a broker_failure
anomaly, under the following conditions:
- A broker fails at time t,
- Anomaly detector starts its first grace period before sending a notification at time t' > t,
- At time t', anomaly detector notifier sends a notification regarding broker failure (this is independent of whether self-healing is enabled or not),
- Anomaly detector starts its second grace period before attempting to start self-healing at t" > t',
- At time t", if self-healing is enabled, anomaly detector starts self-healing, otherwise ignores the broker failure,
- If self-healing was disabled at time t", but later enabled at time t"', this will not make CC start fixing the broker failure that was ignored at time t".
Ideally, CC broker failure self-healing should act on a broker failure that happened before self-healing was enabled.
Ah; will update the title