cloud-native-observability icon indicating copy to clipboard operation
cloud-native-observability copied to clipboard

An open access guide to Cloud-Native Observability.

trafficstars

The Complete Guide to Cloud-Native Observability

Status GitHub release (latest by date including
pre-releases) GitHub Repo stars

About This Project

What is the goal of this project? To provide a clear, concise, and unbiased overview of cloud-native observability.

Why is that important? Despite untold millions of marketing dollars spent in and around observability (or maybe because of the untold millions of marketing dollars spent), cloud-native observability practice today is often inefficient, ineffective, or both. It doesn't need to be that way.

Who is this for? Observability and DevOps practitioners, SREs, end-users of observability systems, and anyone involved in building and running cloud-native software at scale.

How can you help? You can contribute in several ways --

  • Adding your stories and learnings to the "Real World Examples" directory
  • Helping refine and shape this repository through contributions
  • Tackling a good first issue
  • Giving us a star and sharing the repository!

You can also view the project milestones.

What We're Looking For

Currently, we would appreciate feedback (either via issues or pull requests) in the following areas:

  • Overall structure and flow of the document.
  • Overall comprehension of the text/themes.
  • Un-defined, under-defined, or over-defined themes/concepts/terms.
  • General feedback on the ideas.
  • Illustrations and explanatory diagrams.

Table of Contents

  • README
  • Foreword
  • Introduction
    • What's the point of software development, anyway?
    • Why Cloud-Native Matters
  • End-users and Engineers, Transactions and Resources
    • End-users
    • Engineers
    • Transactions
    • Resources
    • SLIs and SLOs
  • The Anatomy of Observability
    • Telemetry
    • Persistence
    • Workflows
  • Telemetry Creation and OpenTelemetry
    • Instrumentation and Granularity
    • OpenTelemetry and Commodity Telemetry
  • Effective Monitoring
    • Effective Dashboards
    • Effective Alerting
    • Fantastic SLOs and Where To Find Them
  • Effective Investigation
    • Context -- What Ties Everything Together
    • Guided Analysis vs. Data Exploration
    • Tagging and Cardinality
  • Effective SLOs
    • Counting the Uncountable
    • Pushing the Envelope
  • Telemetry ROI -- The Elephant in the Room
    • Costs Go Up, ROI Goes Down -- You Can Explain That
    • The Predictive Value of Telemetry
    • Cost Reduction Strategies
  • Organizational Concerns
    • Combating the 'Three Pillars' ideology
    • Build, Buy, and Everything Inbetween
    • Organizing Effective Observability Through Centralization
  • Glossary

Contributing

We gladly accept pull requests! Please see CONTRIBUTING.md for more.

Special Thanks

This guide is made possible thanks to salary-paying by Lightstep.

We'd love for you to check out our report on how OpenTelemetry drives the future of observability as a companion to this whitepaper.


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0