operations
operations copied to clipboard
The sig-operations repository.
Site Reliability Engineering (SRE) Support
This repository contains all the SRE (Site Reliability Engineering) principles and guidelines for managing the Operate First services.
What is SRE?
SRE is a software engineering approach to manage operations for systems, applications and services. We use software as a tool to manage systems, solve problems, and automate operations tasks.
Get started
If you'd like to learn and get hands on experience with SRE practices, but aren't sure where or how to start, let us help!
- Follow this link to find beginner friendly issues.
- Tag yourself in the issue
- Join the Slack and let us know that you're interested in helping by posting in the
#supportchannel a short introduction of yourself and a link to the issue you'd like to complete.
To learn more, check out the incident management procedures, GitHub receiver setup, learn to configure Prometheus alerts, or browse the GitHub repo.