loki icon indicating copy to clipboard operation
loki copied to clipboard

fix(helm): Avoid scraping metrics from gateway pod

Open Skaronator opened this issue 1 year ago • 2 comments

What this PR does / why we need it:

Rename the nginx port name to avoid beeing scraped by the ServiceMonitor: https://github.com/grafana/loki/blob/91a34868db61f2cf4299d618c2e48885ff0a705e/production/helm/loki/templates/monitoring/servicemonitor.yaml#L32-L33

Which issue(s) this PR fixes: Fixes #13201

Special notes for your reviewer:

Checklist

  • [x] Reviewed the CONTRIBUTING.md guide (required)
  • [x] Documentation added
  • [x] Tests updated
  • [x] Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • [ ] Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • [x] For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • [x] If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Skaronator avatar Jun 28 '24 09:06 Skaronator

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 28 '24 09:06 CLAassistant

I am not sure if we need to do a major version bump. Technically, it's a breaking change, but I don't think it will affect anyone.

Skaronator avatar Jun 28 '24 09:06 Skaronator

Ah opened a similar PR a while back https://github.com/grafana/loki/pull/12746, was alerting. Seems to resolve the same thing.

adinhodovic avatar Jul 02 '24 18:07 adinhodovic

I've rebased and fixed the merge conflicts that have come up since creating this PR.

Skaronator avatar Jul 05 '24 06:07 Skaronator

I've rebased this PR again and fixed the merge conflicts that have come up.

Skaronator avatar Jul 22 '24 07:07 Skaronator

Looking forward for this PR!

lblazewski avatar Aug 01 '24 09:08 lblazewski

Resolved merge conflicts again. Hoping for a timely review.

Skaronator avatar Aug 12 '24 13:08 Skaronator

cc @vlad-diachenko

Skaronator avatar Aug 12 '24 13:08 Skaronator

cc @vlad-diachenko

I believe that we have the issue only when nginx is deployed as a gateway.... so, I would configure nginx to expose the metrics under /metrics path. wdyt?

vlad-diachenko avatar Aug 13 '24 11:08 vlad-diachenko

Hi, is there an update on this? We need this fix as we have the service monitor target for nginx pod down for quite long time now

francesco-kekko-damico avatar Oct 30 '24 08:10 francesco-kekko-damico

any news on this fix ?

nlamirault avatar Dec 26 '24 15:12 nlamirault

@vlad-diachenko @hervenicol anything I can do to help unblock a fix for this issue?

Starefossen avatar Jan 27 '25 09:01 Starefossen

@vlad-diachenko @hervenicol anything I can do to help unblock a fix for this issue?

Hey @Starefossen . We can not merge it because of the thing I mentioned in the comment . This port should be scraped if GEL image is used for the gateway. So, if the issue happens only when NGING is used as a gateway, I would add label to the gateway pod prometheus.io/scrape: "false" , so it won't be scraped. (it needs to be tested because I googled this solution) wdyt?

vlad-diachenko avatar Jan 31 '25 12:01 vlad-diachenko

I can confirm that adding the prometheus.io/service-monitor: "false" label to the gateway pods disables the erroneous scraping:

  gateway:
    service:
      labels:
        prometheus.io/service-monitor: "false"

Starefossen avatar Feb 03 '25 08:02 Starefossen

I can confirm that adding the prometheus.io/service-monitor: "false" label to the gateway pods disables the erroneous scraping:

  gateway:
    service:
      labels:
        prometheus.io/service-monitor: "false"

I believe that it's the way we need to fix the issue. Huge thanks for the confirmation

vlad-diachenko avatar Feb 03 '25 10:02 vlad-diachenko

Can we merge this? https://github.com/grafana/loki/pull/12746 - opened this fix a while back and works locally, just want it upstream.

adinhodovic avatar Feb 03 '25 10:02 adinhodovic

this PR can be closed because the issue is fixed in https://github.com/grafana/loki/pull/12746

vlad-diachenko avatar Feb 05 '25 12:02 vlad-diachenko