observability-workshop icon indicating copy to clipboard operation
observability-workshop copied to clipboard

Identify the less observable fault

Open abangser opened this issue 5 years ago • 0 comments

Identified the fault we want to inject

  • Requirements for issue:

    • Not able to be debugged from UI
    • Not able to be identified in metrics, logs, or traces
    • A pretty clear gap in observability
    • Pretty obscure probably as that is when observability is most necessary (but needs to be something common enough that we it’s not “works on my machine”)
  • Possible ideas:

    • Can we create a service that turns into a black hole pretty frequently? The idea being this emmits no diagnostics and fails in a frustrating manner. For example, accepts incoming requests but never returns.
    • Issue is exacerbated on a single node/vm
    • docker chaos degrading traffic between a couple nodes
  • Documented way to create it

  • Documented way to fix it

  • Example graphs/alerts to identify the issue

  • Example how to increase observability to track issue

abangser avatar Apr 28 '19 17:04 abangser