reliability-engineering topic
litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd....
OpenShift-Guide
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.
sre-tools
A collection of SRE tools
SurPyval
A Python package for survival analysis. The most flexible survival analysis package available. SurPyval can work with arbitrary combinations of observed, censored, and truncated data. SurPyval can als...
sre-checklist
A checklist of anyone practicing Site Reliability Engineering
deep_cox_mixtures
Code for the paper "Deep Cox Mixtures for Survival Regression", Machine Learning for Healthcare Conference 2021
stable-systems-checklist
An opinionated list of attributes and policies that need to be met in order to establish a stable software system.
sreworkbook-templates-md
A collection templates ported from the SRE Workbook
paas-docker-cloudfoundry-tools