observability-wg
observability-wg copied to clipboard
Observability Guidelines
What to include in Observability Guidelines?
For me the basics are:
- Tutorial how to add
telemetryto your library- best practices for event naming
- best practices for metadata usage
- what and how measure and track
- gathering VM metrics
- gathering OS metrics (via
os_monapplication)
- Tutorial how to integrate OTel into your release
- adding OTel to application
- configuration from the ground up
- attaching it to
telemetry - displaying gathered data (is there web UI available already?)
- how to prepare release to use OTel
- Due to Phoenix popularity - example of Phoenix application that uses all of the above
- Cowboy
- Ecto
- Plug
- Absinthe (?)
- Some custom data
- Grafana dashboards for OTel metrics (?)
- What we see on each dashboard
- How to read data
- How to analyse data
- How to react on changes in data (what mean that we have spike in number of reductions, etc.)
I do not know if anything more would need to be added, but I am open to suggestions. I think we could use GitBook to publish it in book-like format.
I agree that the points mentioned above should be covered (it's a lot, but we'll get there). Apart from describing the tools that can be used, I think we should also write the guidelines starting from different objectives, e.g:
- how to instrument a library
- how to record VM metrics
- how to visualize data
In addition, we could write why to monitor systems in the first place, why using structured logging is important, why labelled, multi-dimensional metrics are more useful than flat ones etc.