loki icon indicating copy to clipboard operation
loki copied to clipboard

feat: add missing cluster label to mixins

Open QuentinBisson opened this issue 9 months ago • 6 comments

What this PR does / why we need it:

This PR enhance the loki alert mixins by adding the cluster label to all summary.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Checklist

  • [ ] Reviewed the CONTRIBUTING.md guide (required)
  • [ ] Documentation added
  • [ ] Tests updated
  • [ ] Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • [ ] Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • [ ] For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • [ ] If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

QuentinBisson avatar May 03 '24 08:05 QuentinBisson

Oh you're right :( Btw, i've just discovered a tool call pint that could detect such issues using ci validation. Do you think it would make sense to add it to the Mimir generated rules?

QuentinBisson avatar May 03 '24 19:05 QuentinBisson

Oh you're right :( Btw, i've just discovered a tool call pint that could detect such issues using ci validation. Do you think it would make sense to add it to the Mimir generated rules?

It's worth looking into, happy to review a PR if you come up with something that works :+1:

cstyan avatar May 03 '24 19:05 cstyan

Here is the output of pint without colors :(

I think it would be best to add it to the loki-build-image but I'm not sure if you like to have contribution on that repo.

./pint-linux-amd64 --offline lint --min-severity=information /home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/*.yaml
level=INFO msg="Finding all rules to check" paths=["/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml","/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/rules.yaml"]
level=INFO msg="Offline mode, skipping Prometheus discovery"
level=ERROR msg="Execution completed with error(s)" err="invalid --min-severity value: unknown severity: information"
> ./pint-linux-amd64 --offline lint --min-severity=info /home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/*.yaml
level=INFO msg="Finding all rules to check" paths=["/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml","/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/rules.yaml"]
level=INFO msg="Offline mode, skipping Prometheus discovery"
/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:6-7 Bug: Template is using `cluster` label but the query removes it. (alerts/template)
 6 |             description: |
 7 |                 {{ $labels.cluster }} {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.

/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:6-7 Information: Using the value of `rate(loki_request_duration_seconds_count{status_code=~"5.."}[2m])` inside this annotation might be hard to read, consider using one of humanize template functions to make it more human friendly. (alerts/template)
 6 |             description: |
 7 |                 {{ $labels.cluster }} {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.

/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:19-20 Bug: Template is using `cluster` label but the query removes it. (alerts/template)
 19 |             description: |
 20 |                 {{ $labels.cluster }} {{ $labels.job }} is experiencing {{ printf "%.2f" $value }}% increase of panics.

level=INFO msg="Problems found" Bug=2 Information=1
level=ERROR msg="Execution completed with error(s)" err="found 1 problem(s) with severity Bug or higher"

QuentinBisson avatar May 06 '24 11:05 QuentinBisson

@cstyan it should be fine now, pint is happy apart from the humanize thing

QuentinBisson avatar May 06 '24 12:05 QuentinBisson

I think it would be best to add it to the loki-build-image but I'm not sure if you like to have contribution on that repo.

Agreed, it would be best to have it nicely runnable in a reproduce-able way. For now, can you show me how the invocation of pint works and I can setup a CI job via github actions at least post-merging of your PR.

cstyan avatar May 06 '24 17:05 cstyan

I meant this section:

/home/quentin/Documents/code/QuentinBisson/loki-upstream/production/loki-mixin-compiled-ssd/alerts.yaml:6-7 Information: Using the value of `rate(loki_request_duration_seconds_count{status_code=~"5.."}[2m])` inside this annotation might be hard to read, consider using one of humanize template functions to make it more human friendly. (alerts/template)
 6 |             description: |
 7 |                 {{ $labels.cluster }} {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.

But I did not really play with humanize functions :)

So I run pint from the cli for this one but I think you could use the pint ci directly: https://cloudflare.github.io/pint/#pull-requests

There are a lot of things that could be added (like enforcing that some annotations or labels are always set) but an initial version would be good.

I will deploy it tomorrow or later today, I did not have time to full test my changes :( i'll ping you when i'm done and thank you for the review

QuentinBisson avatar May 06 '24 18:05 QuentinBisson

@vlad-diachenko any idea why the helm diff action is failing here? I believe that's a new one you wrote?

cstyan avatar Sep 17 '24 23:09 cstyan

From what I see, it's using the wrong remote as the log line talks about origin but as my PR comes from a fork it's probably incorrect:

/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/loki-mixins-add-cluster-label*:refs/remotes/origin/loki-mixins-add-cluster-label* +refs/tags/loki-mixins-add-cluster-label*:refs/tags/loki-mixins-add-cluster-label*

QuentinBisson avatar Sep 18 '24 14:09 QuentinBisson

From what I see, it's using the wrong remote as the log line talks about origin but as my PR comes from a fork it's probably incorrect:

/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/loki-mixins-add-cluster-label*:refs/remotes/origin/loki-mixins-add-cluster-label* +refs/tags/loki-mixins-add-cluster-label*:refs/tags/loki-mixins-add-cluster-label*

yep, this seems like the issue, Vlad is looking into it on our end

cstyan avatar Sep 18 '24 16:09 cstyan

Awesome thanks :)

QuentinBisson avatar Sep 18 '24 17:09 QuentinBisson

Hey folks. I hope this PR fixes the issue https://github.com/grafana/loki/pull/14173

vlad-diachenko avatar Sep 18 '24 18:09 vlad-diachenko

Hey @QuentinBisson , @cstyan . I have fixed the checkout step that was failing at the beginning, but now it started to fail another step image

I will check it tomorrow morning and will fix it and merge the PR ;)

vlad-diachenko avatar Sep 18 '24 18:09 vlad-diachenko

Thanks for the work @vlad-diachenko :)

QuentinBisson avatar Sep 18 '24 19:09 QuentinBisson

Vlad mentioned to me that he couldn't get everything working correctly for PRs from forks, so he changed the workflow to not run on PRs from forks for now.

cstyan avatar Sep 19 '24 19:09 cstyan