loki
loki copied to clipboard
fix(helm): Avoid scraping metrics from gateway pod
What this PR does / why we need it:
Rename the nginx port name to avoid beeing scraped by the ServiceMonitor: https://github.com/grafana/loki/blob/91a34868db61f2cf4299d618c2e48885ff0a705e/production/helm/loki/templates/monitoring/servicemonitor.yaml#L32-L33
Which issue(s) this PR fixes: Fixes #13201
Special notes for your reviewer:
Checklist
- [x] Reviewed the
CONTRIBUTING.mdguide (required) - [x] Documentation added
- [x] Tests updated
- [x] Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
- [ ] Changes that require user attention or interaction to upgrade are documented in
docs/sources/setup/upgrade/_index.md - [x] For Helm chart changes bump the Helm chart version in
production/helm/loki/Chart.yamland updateproduction/helm/loki/CHANGELOG.mdandproduction/helm/loki/README.md. Example PR - [x] If the change is deprecating or removing a configuration option, update the
deprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR
I am not sure if we need to do a major version bump. Technically, it's a breaking change, but I don't think it will affect anyone.
Ah opened a similar PR a while back https://github.com/grafana/loki/pull/12746, was alerting. Seems to resolve the same thing.
I've rebased and fixed the merge conflicts that have come up since creating this PR.
I've rebased this PR again and fixed the merge conflicts that have come up.
Looking forward for this PR!
Resolved merge conflicts again. Hoping for a timely review.
cc @vlad-diachenko
cc @vlad-diachenko
I believe that we have the issue only when nginx is deployed as a gateway.... so, I would configure nginx to expose the metrics under /metrics path. wdyt?
Hi, is there an update on this? We need this fix as we have the service monitor target for nginx pod down for quite long time now
any news on this fix ?
@vlad-diachenko @hervenicol anything I can do to help unblock a fix for this issue?
@vlad-diachenko @hervenicol anything I can do to help unblock a fix for this issue?
Hey @Starefossen .
We can not merge it because of the thing I mentioned in the comment .
This port should be scraped if GEL image is used for the gateway.
So, if the issue happens only when NGING is used as a gateway, I would add label to the gateway pod prometheus.io/scrape: "false" , so it won't be scraped. (it needs to be tested because I googled this solution)
wdyt?
I can confirm that adding the prometheus.io/service-monitor: "false" label to the gateway pods disables the erroneous scraping:
gateway:
service:
labels:
prometheus.io/service-monitor: "false"
I can confirm that adding the
prometheus.io/service-monitor: "false"label to the gateway pods disables the erroneous scraping:gateway: service: labels: prometheus.io/service-monitor: "false"
I believe that it's the way we need to fix the issue. Huge thanks for the confirmation
Can we merge this? https://github.com/grafana/loki/pull/12746 - opened this fix a while back and works locally, just want it upstream.
this PR can be closed because the issue is fixed in https://github.com/grafana/loki/pull/12746