loki
loki copied to clipboard
feat(helm): Allow configurable severity and thresholds for alerts
What this PR does / why we need it:
The Loki helm chart comes with hardcoded PrometheusRules where users are unable to change the e.g., severity or threshold to best fit their environment's behavior.
This PR allows users to set some overrides.
For example, in my environment, a critical severity may mean a phone call in the middle of the night. So I have a values.yaml with the following override:
monitoring:
rules:
configs:
LokiRequestPanics:
# warn me if there are more than 3 panics over the last hour
lookbackPeriod: 60m
severity: warning
threshold: 3
Which issue(s) this PR fixes: Fixes https://github.com/grafana/loki/issues/11219
Special notes for your reviewer:
- This is simply pulling the default attributes of the PrometheusRule expressions into
values.yaml - This is backwards compatible with the
monitoring.rules.disabledkeys (marking that as deprecated and to use configs for enabling/disabling). But we keep the deprecated keys for backwards compatibility - Testing: Ran
helm template test --set loki.useTestSchema=true --set monitoring.rules.enabled=true --debug . --validateand confirmed not rendered diff before and after.
Checklist
- [x] Reviewed the
CONTRIBUTING.mdguide (required) - [ ] Documentation added
- [ ] Tests updated
- [x] Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
- [ ] Changes that require user attention or interaction to upgrade are documented in
docs/sources/setup/upgrade/_index.md - [x] For Helm chart changes bump the Helm chart version in
production/helm/loki/Chart.yamland updateproduction/helm/loki/CHANGELOG.mdandproduction/helm/loki/README.md. Example PR - [ ] If the change is deprecating or removing a configuration option, update the
deprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR
Hi @JStickler @DylanGuedes @chaudum @vlad-diachenko : I'm wondering how we can get a reviewer/approval for this? I don't see any guidance in the Contributors doc or guidelines.
Tagged you as I saw the last couple feat(helm) approvals came from you (apologies in advance if it should be someone else, or if I missed something in CONTRIBUTING.md).
This is to address/resolve https://github.com/grafana/loki/issues/11219
Adding @trevorwhitney and @paul1r also for approval of the CI workflows and possible review/approve/merge?
@schahal Hello from new maintainer team! 🖐️
Do we still need this PR? If yes, please update it and add a CHANGELOG entry and remove the version bump. Thanks!
Hi @jkroepke , yeah I think this is still needed (as I've simply worked around it by disabling the rules altogether).
I've updated the branch to (a) remove the Chart version bump, (b) updated the CHANGELOG, (c) additionalRuleAnnotations has been added since I last worked on this branch, updated its rendering.
LGTM
Thanks @jkroepke . Are there any additional steps I need to take to get this merged?