observability-workshop Identify the less observable fault

Identify the less observable fault

Open abangser opened this issue 5 years ago • 0 comments

Identified the fault we want to inject

Requirements for issue:
- Not able to be debugged from UI
- Not able to be identified in metrics, logs, or traces
- A pretty clear gap in observability
- Pretty obscure probably as that is when observability is most necessary (but needs to be something common enough that we it’s not “works on my machine”)
Possible ideas:
- Can we create a service that turns into a black hole pretty frequently? The idea being this emmits no diagnostics and fails in a frustrating manner. For example, accepts incoming requests but never returns.
- Issue is exacerbated on a single node/vm
- docker chaos degrading traffic between a couple nodes
Documented way to create it
Documented way to fix it
Example graphs/alerts to identify the issue
Example how to increase observability to track issue

Apr 28 '19 17:04 abangser