runbooks icon indicating copy to clipboard operation
runbooks copied to clipboard

Results 33 runbooks issues
Sort by recently updated
recently updated
newest added

**What does this PR / Why do we need it?** - Fixed a couple of typos

Added new runbook for Kubernetes Api Server Latency alert calculating 99th percentile latency of $value in seconds for resource. 'histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"} [10m])) WITHOUT (instance, resource)) / 1e+06 > 1'

Hi, To gracefully update my cluster's node groups I mark old nodes as `NoSchedule`. This generates alerts of type `KubeDaemonSetMisScheduled`, which I had to disable, but it also generates `KubeDaemonSetRolloutStuck`,...

fixed typo: "tine tuned" -> "fine tuned"

When I had PrometheusTSDBCompactionsFailing alerts I had corrupted WAL files (with error messages in the logs looking like this: `WAL truncation in Compact: create checkpoint: read segments: corruption in segment...

## Description Add documentation to diagnose NodeFilesystemAlmostOutOfFiles ## Type of change What type of changes does your code introduce to the Prometheus operator? Put an x in the box that...

Small improvements to the page, fixing the link and using a list for "see also" links, for better display.

Need to update the runbook since the alert was renamed into the kubernetes-mixin from `KubeJobCompletion` to `KubeJobNotCompleted` https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/746/

A quick runbook to mitigate `PrometheusMissingRuleEvaluations` alert. ref.: - https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#rule_group - https://www.robustperception.io/rule-groups-for-hierarchical-aggregation/