devops-course
devops-course copied to clipboard
Monitoring, tracing, observability in DevOps
- https://en.wikipedia.org/wiki/Tracing_(software)
- https://en.wikipedia.org/wiki/System_monitoring
- https://en.wikipedia.org/wiki/Network_monitoring
- https://en.wikipedia.org/wiki/Crash_reporter
- https://en.wikipedia.org/wiki/Application_performance_management
- https://en.wikipedia.org/wiki/Website_monitoring
- https://en.wikipedia.org/wiki/Provenance#Computer_science
- https://en.wikipedia.org/wiki/Log_analysis
See also icinga (thanks to @henriklb for the suggestion)
Log analysis @Eclipse https://projects.eclipse.org/projects/tools.tracecompass
We've found Istio ( https://istio.io/ ) to be increasingly useful in this context. KubeSpy ( https://github.com/pulumi/kubespy )is an excellent tool for troubleshooting and diagnosing Kubernetes deployments.
- Prometheus
- Sensu https://sensu.io/
- Zipkin
- the ELK stack.
+1 for Prometheus
Sentry for Error Reporting. https://sentry.io/welcome/
- OpenZipkin
- Jaeger https://github.com/jaegertracing/jaeger
- https://medium.com/@rakyll/cpdd-critical-path-driven-development-6c2592fb8ea4
(from https://github.com/KTH/devops-course/issues/16#issue-371440053)
See also Runtime application self-protection https://github.com/KTH/devops-course/issues/18#issuecomment-435888119
Analytics
Tools and Benchmarks for Automated Log Parsing. http://arxiv.org/abs/1811.03509
Does the Fault Reside in a Stack Trace? Assisting Crash Localization by Predicting Crashing Fault Residence https://www.sciencedirect.com/science/article/pii/S0164121218302401
Having good dashboards is essential in DevOps, see Kibana, etc.
Made in Alibaba: https://github.com/alibaba/Sentinel
JVM Profiler Sending Metrics to Kafka (https://kafka.apache.org/), Console Output or Custom Reporter https://github.com/uber-common/jvm-profiler
https://github.com/madflojo/automatron
https://github.com/apache/incubator-skywalking
Time-series database to store monitoring data https://en.wikipedia.org/wiki/Time_series_database
Prometheus - Monitoring system & time series database https://prometheus.io/
Netflix Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more. https://github.com/Netflix/zuul
OpenTracing https://opentracing.io/
Nagios https://en.wikipedia.org/wiki/Nagios
Sensu is a free and open source monitoring that handles cloud environments. Sensu allows you to monitor servers, services, application health, and business KPIs. https://xebialabs.com/technology/sensu/
Provenance analysis tools
- SPADE : https://github.com/ashish-gehani/spade
- Camflow : http://camflow.org/
Framework for instruction-level tracing and analysis of program executions http://static.usenix.org/event/vee06/full_papers/p154-bhansali.pdf
DevOps Metrics https://queue.acm.org/detail.cfm?id=3182626
Dapper, a large-scale distributed systems tracing infrastructure at Google http://research.google.com/pubs/pub36356.html
Chaos Engineering & Observability https://www.infoq.com/news/2019/03/chaos-engineering-observability
Humio: All of your data: logs, metrics, traces. Search, analyze and visualize instantly. Live system observability. https://humio.com/
The OpenTracing project https://opentracing.io/
Papers:
- Stardust: tracking activity in a distributed storage system 2006
- X-trace: A pervasive network tracing framework 2007
- Fay: extensible distributed tracing from kernels to clusters 2012
- So, you want to trace your distributed system? key design s from years of practical experience 2014
- Pivot tracing: Dynamic causal monitoring for distributed systems 2015