devops-course icon indicating copy to clipboard operation
devops-course copied to clipboard

Monitoring, tracing, observability in DevOps

Open monperrus opened this issue 6 years ago • 70 comments

  • https://en.wikipedia.org/wiki/Tracing_(software)
  • https://en.wikipedia.org/wiki/System_monitoring
  • https://en.wikipedia.org/wiki/Network_monitoring
  • https://en.wikipedia.org/wiki/Crash_reporter
  • https://en.wikipedia.org/wiki/Application_performance_management
  • https://en.wikipedia.org/wiki/Website_monitoring
  • https://en.wikipedia.org/wiki/Provenance#Computer_science
  • https://en.wikipedia.org/wiki/Log_analysis

monperrus avatar Jul 02 '18 11:07 monperrus

See also icinga (thanks to @henriklb for the suggestion)

monperrus avatar Aug 20 '18 18:08 monperrus

Log analysis @Eclipse https://projects.eclipse.org/projects/tools.tracecompass

monperrus avatar Sep 18 '18 14:09 monperrus

We've found Istio ( https://istio.io/ ) to be increasingly useful in this context. KubeSpy ( https://github.com/pulumi/kubespy )is an excellent tool for troubleshooting and diagnosing Kubernetes deployments.

MatsJonsson avatar Oct 11 '18 08:10 MatsJonsson

  • Prometheus
  • Sensu https://sensu.io/
  • Zipkin
  • the ELK stack.

lsc avatar Oct 11 '18 09:10 lsc

+1 for Prometheus

MatsJonsson avatar Oct 11 '18 09:10 MatsJonsson

Sentry for Error Reporting. https://sentry.io/welcome/

bittermandel avatar Oct 18 '18 09:10 bittermandel

(from https://github.com/KTH/devops-course/issues/16#issue-371440053)

monperrus avatar Oct 26 '18 09:10 monperrus

See also Runtime application self-protection https://github.com/KTH/devops-course/issues/18#issuecomment-435888119

monperrus avatar Nov 05 '18 14:11 monperrus

Analytics

monperrus avatar Nov 08 '18 13:11 monperrus

Tools and Benchmarks for Automated Log Parsing. http://arxiv.org/abs/1811.03509

monperrus avatar Nov 12 '18 20:11 monperrus

Does the Fault Reside in a Stack Trace? Assisting Crash Localization by Predicting Crashing Fault Residence https://www.sciencedirect.com/science/article/pii/S0164121218302401

monperrus avatar Nov 12 '18 21:11 monperrus

Having good dashboards is essential in DevOps, see Kibana, etc.

monperrus avatar Dec 10 '18 13:12 monperrus

Made in Alibaba: https://github.com/alibaba/Sentinel

monperrus avatar Jan 22 '19 20:01 monperrus

JVM Profiler Sending Metrics to Kafka (https://kafka.apache.org/), Console Output or Custom Reporter https://github.com/uber-common/jvm-profiler

monperrus avatar Feb 22 '19 10:02 monperrus

https://github.com/madflojo/automatron

monperrus avatar Mar 05 '19 10:03 monperrus

https://github.com/apache/incubator-skywalking

monperrus avatar Mar 05 '19 10:03 monperrus

Time-series database to store monitoring data https://en.wikipedia.org/wiki/Time_series_database

monperrus avatar Mar 05 '19 10:03 monperrus

Prometheus - Monitoring system & time series database https://prometheus.io/

monperrus avatar Mar 05 '19 10:03 monperrus

Netflix Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more. https://github.com/Netflix/zuul

monperrus avatar Mar 05 '19 10:03 monperrus

OpenTracing https://opentracing.io/

monperrus avatar Mar 05 '19 10:03 monperrus

Nagios https://en.wikipedia.org/wiki/Nagios

monperrus avatar Mar 05 '19 10:03 monperrus

Sensu is a free and open source monitoring that handles cloud environments. Sensu allows you to monitor servers, services, application health, and business KPIs. https://xebialabs.com/technology/sensu/

monperrus avatar Mar 05 '19 10:03 monperrus

Provenance analysis tools

  • SPADE : https://github.com/ashish-gehani/spade
  • Camflow : http://camflow.org/

bbaudry avatar Mar 05 '19 10:03 bbaudry

Framework for instruction-level tracing and analysis of program executions http://static.usenix.org/event/vee06/full_papers/p154-bhansali.pdf

monperrus avatar Mar 07 '19 14:03 monperrus

DevOps Metrics https://queue.acm.org/detail.cfm?id=3182626

monperrus avatar Mar 22 '19 09:03 monperrus

Dapper, a large-scale distributed systems tracing infrastructure at Google http://research.google.com/pubs/pub36356.html

monperrus avatar Mar 22 '19 09:03 monperrus

Chaos Engineering & Observability https://www.infoq.com/news/2019/03/chaos-engineering-observability

monperrus avatar Mar 29 '19 07:03 monperrus

Humio: All of your data: logs, metrics, traces. Search, analyze and visualize instantly. Live system observability. https://humio.com/

monperrus avatar Mar 29 '19 07:03 monperrus

The OpenTracing project https://opentracing.io/

monperrus avatar Mar 29 '19 07:03 monperrus

Papers:

  • Stardust: tracking activity in a distributed storage system 2006
  • X-trace: A pervasive network tracing framework 2007
  • Fay: extensible distributed tracing from kernels to clusters 2012
  • So, you want to trace your distributed system? key design s from years of practical experience 2014
  • Pivot tracing: Dynamic causal monitoring for distributed systems 2015

monperrus avatar Apr 05 '19 09:04 monperrus