loki icon indicating copy to clipboard operation
loki copied to clipboard

feat(helm): Allow configurable severity and thresholds for alerts

Open schahal opened this issue 1 year ago • 2 comments

What this PR does / why we need it:

The Loki helm chart comes with hardcoded PrometheusRules where users are unable to change the e.g., severity or threshold to best fit their environment's behavior.

This PR allows users to set some overrides.

For example, in my environment, a critical severity may mean a phone call in the middle of the night. So I have a values.yaml with the following override:

monitoring:
  rules:
    configs:
      LokiRequestPanics:
        # warn me if there are more than 3 panics over the last hour
        lookbackPeriod: 60m
        severity: warning
        threshold: 3

Which issue(s) this PR fixes: Fixes https://github.com/grafana/loki/issues/11219

Special notes for your reviewer:

  • This is simply pulling the default attributes of the PrometheusRule expressions into values.yaml
  • This is backwards compatible with the monitoring.rules.disabled keys (marking that as deprecated and to use configs for enabling/disabling). But we keep the deprecated keys for backwards compatibility
  • Testing: Ran helm template test --set loki.useTestSchema=true --set monitoring.rules.enabled=true --debug . --validate and confirmed not rendered diff before and after.

Checklist

  • [x] Reviewed the CONTRIBUTING.md guide (required)
  • [ ] Documentation added
  • [ ] Tests updated
  • [x] Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • [ ] Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • [x] For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • [ ] If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

schahal avatar Aug 01 '24 00:08 schahal

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 01 '24 00:08 CLAassistant

Hi @JStickler @DylanGuedes @chaudum @vlad-diachenko : I'm wondering how we can get a reviewer/approval for this? I don't see any guidance in the Contributors doc or guidelines.

Tagged you as I saw the last couple feat(helm) approvals came from you (apologies in advance if it should be someone else, or if I missed something in CONTRIBUTING.md).

This is to address/resolve https://github.com/grafana/loki/issues/11219

schahal avatar Aug 28 '24 16:08 schahal

Adding @trevorwhitney and @paul1r also for approval of the CI workflows and possible review/approve/merge?

schahal avatar Sep 05 '24 15:09 schahal

@schahal Hello from new maintainer team! 🖐️

Do we still need this PR? If yes, please update it and add a CHANGELOG entry and remove the version bump. Thanks!

jkroepke avatar Oct 16 '25 18:10 jkroepke

Hi @jkroepke , yeah I think this is still needed (as I've simply worked around it by disabling the rules altogether).

I've updated the branch to (a) remove the Chart version bump, (b) updated the CHANGELOG, (c) additionalRuleAnnotations has been added since I last worked on this branch, updated its rendering.

schahal avatar Oct 16 '25 23:10 schahal

LGTM

jkroepke avatar Oct 17 '25 19:10 jkroepke

Thanks @jkroepke . Are there any additional steps I need to take to get this merged?

schahal avatar Oct 18 '25 00:10 schahal