loki
loki copied to clipboard
feat: add missing cluster label to mixins
What this PR does / why we need it:
This PR enhance the loki alert mixins by adding the cluster label to all summary.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
- [ ] Reviewed the
CONTRIBUTING.md
guide (required) - [ ] Documentation added
- [ ] Tests updated
- [ ] Title matches the required conventional commits format, see here
-
Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
-
Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
- [ ] Changes that require user attention or interaction to upgrade are documented in
docs/sources/setup/upgrade/_index.md
- [ ] For Helm chart changes bump the Helm chart version in
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PR - [ ] If the change is deprecating or removing a configuration option, update the
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR
Oh you're right :( Btw, i've just discovered a tool call pint that could detect such issues using ci validation. Do you think it would make sense to add it to the Mimir generated rules?
Oh you're right :( Btw, i've just discovered a tool call pint that could detect such issues using ci validation. Do you think it would make sense to add it to the Mimir generated rules?
It's worth looking into, happy to review a PR if you come up with something that works :+1:
Here is the output of pint without colors :(
I think it would be best to add it to the loki-build-image but I'm not sure if you like to have contribution on that repo.
./pint-linux-amd64 --offline lint --min-severity=information /home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/*.yaml
level=INFO msg="Finding all rules to check" paths=["/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml","/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/rules.yaml"]
level=INFO msg="Offline mode, skipping Prometheus discovery"
level=ERROR msg="Execution completed with error(s)" err="invalid --min-severity value: unknown severity: information"
> ./pint-linux-amd64 --offline lint --min-severity=info /home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/*.yaml
level=INFO msg="Finding all rules to check" paths=["/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml","/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/rules.yaml"]
level=INFO msg="Offline mode, skipping Prometheus discovery"
/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:6-7 Bug: Template is using `cluster` label but the query removes it. (alerts/template)
6 | description: |
7 | {{ $labels.cluster }} {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.
/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:6-7 Information: Using the value of `rate(loki_request_duration_seconds_count{status_code=~"5.."}[2m])` inside this annotation might be hard to read, consider using one of humanize template functions to make it more human friendly. (alerts/template)
6 | description: |
7 | {{ $labels.cluster }} {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.
/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:19-20 Bug: Template is using `cluster` label but the query removes it. (alerts/template)
19 | description: |
20 | {{ $labels.cluster }} {{ $labels.job }} is experiencing {{ printf "%.2f" $value }}% increase of panics.
level=INFO msg="Problems found" Bug=2 Information=1
level=ERROR msg="Execution completed with error(s)" err="found 1 problem(s) with severity Bug or higher"
@cstyan it should be fine now, pint is happy apart from the humanize thing
I think it would be best to add it to the loki-build-image but I'm not sure if you like to have contribution on that repo.
Agreed, it would be best to have it nicely runnable in a reproduce-able way. For now, can you show me how the invocation of pint works and I can setup a CI job via github actions at least post-merging of your PR.
I meant this section:
/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:6-7 Information: Using the value of `rate(loki_request_duration_seconds_count{status_code=~"5.."}[2m])` inside this annotation might be hard to read, consider using one of humanize template functions to make it more human friendly. (alerts/template)
6 | description: |
7 | {{ $labels.cluster }} {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.
But I did not really play with humanize functions :)
So I run pint from the cli for this one but I think you could use the pint ci directly: https://cloudflare.github.io/pint/#pull-requests
There are a lot of things that could be added (like enforcing that some annotations or labels are always set) but an initial version would be good.
I will deploy it tomorrow or later today, I did not have time to full test my changes :( i'll ping you when i'm done and thank you for the review
@vlad-diachenko any idea why the helm diff action is failing here? I believe that's a new one you wrote?
From what I see, it's using the wrong remote as the log line talks about origin
but as my PR comes from a fork it's probably incorrect:
/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/loki-mixins-add-cluster-label*:refs/remotes/origin/loki-mixins-add-cluster-label* +refs/tags/loki-mixins-add-cluster-label*:refs/tags/loki-mixins-add-cluster-label*
From what I see, it's using the wrong remote as the log line talks about
origin
but as my PR comes from a fork it's probably incorrect:
/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/loki-mixins-add-cluster-label*:refs/remotes/origin/loki-mixins-add-cluster-label* +refs/tags/loki-mixins-add-cluster-label*:refs/tags/loki-mixins-add-cluster-label*
yep, this seems like the issue, Vlad is looking into it on our end
Awesome thanks :)
Hey folks. I hope this PR fixes the issue https://github.com/grafana/loki/pull/14173
Hey @QuentinBisson , @cstyan . I have fixed the checkout step that was failing at the beginning, but now it started to fail another step
I will check it tomorrow morning and will fix it and merge the PR ;)
Thanks for the work @vlad-diachenko :)
Vlad mentioned to me that he couldn't get everything working correctly for PRs from forks, so he changed the workflow to not run on PRs from forks for now.