loki
loki copied to clipboard
Loki gateway metrics (Nginx)
Is your feature request related to a problem? Please describe. I'm not be able to view if Loki Gateway (Nginx) is fully operational. Only logs.
Describe the solution you'd like Enable nginx exporter + service monitor + create a dashboard + alert
Hey, i enabled monitoring in the helm chart but getting targetDown for loki-gateway scraper
monitoring:
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
dashboards:
enabled: true
rules:
enabled: true
serviceMonitor:
enabled: true
lokiCanary:
enabled: false
Alerts:
[FIRING:1] :warning: TargetDown • 100% of the monitoring/loki-gateway/loki-gateway targets in monitoring namespace are down.
This is using alertmanager with prometheus, any ideas on what values do i need to configure nginx-exporter for loki-gateway pod in kubernetes?
Cheers
Took a look at the rendered CRD's
Name: loki
Namespace: monitoring
Labels: app.kubernetes.io/instance=loki
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=loki
app.kubernetes.io/version=3.0.0
argocd.argoproj.io/instance=loki
helm.sh/chart=loki-6.5.0
Annotations: <none>
API Version: monitoring.coreos.com/v1
Kind: ServiceMonitor
Metadata:
Creation Timestamp: 2024-02-28T13:15:15Z
Generation: 1
Resource Version: 40402766
UID: 7d63382c-2cf4-45ab-9200-f3239a2dda76
Spec:
Endpoints:
Interval: 15s
Path: /metrics
Port: http-metrics
Relabelings:
Action: replace
Replacement: monitoring/$1
Source Labels:
job
Target Label: job
Action: replace
Replacement: loki
Target Label: cluster
Scheme: http
Selector:
Match Expressions:
Key: prometheus.io/service-monitor
Operator: NotIn
Values:
false
Match Labels:
app.kubernetes.io/instance: loki
app.kubernetes.io/name: loki
Events: <none>
Its just a serviceMonitor pointing to a broken service endpoint so we can safely delete for the moment:
monitoring:
selfMonitoring:
enabled: false
grafanaAgent:
installOperator: false
dashboards:
enabled: false
rules:
enabled: false
serviceMonitor:
enabled: false
lokiCanary:
enabled: false
Seems like /metrics path is not defined in nginx.conf for loki-gateway:
https://github.com/grafana/loki/blob/main/production/helm/loki/templates/_helpers.tpl#L750-L1014
But this endpoint is defined for loki-gateway deployment template: https://github.com/grafana/loki/blob/main/production/helm/loki/templates/gateway/deployment-gateway-nginx.yaml#L63-L66
Servicemonitor is created for Prometheus to scrape all http-metrics endpoints, so it gets 404 when it tries to scrape /metrics:
10.244.4.42 - - [26/May/2024:10:01:37 +0000] 404 "GET /metrics HTTP/1.1" 153 "-" "Prometheus/2.51.1" "-"
10.244.4.42 - - [26/May/2024:10:01:52 +0000] 404 "GET /metrics HTTP/1.1" 153 "-" "Prometheus/2.51.1" "-"
IMO the dirty way is to set serviceMonitor.enabled: false as @paltaa suggested. But it disables monitoring for the whole loki deployment.
Looks like previously in 2.x helm charts the endpoint name was just http:
https://github.com/grafana/loki/blob/v2.9.8/production/helm/loki/templates/gateway/deployment-gateway.yaml#L62
And now it's changed for http-metrics and is also used by readinessProbe for gateway deployment:
https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L1019-L1022
Suffering from the same issue.
A bit nicer workaround: the serviceMonitor contains a check where the label prometheus.io/service-monitor: "false" may not be present on your service. So by adding that to your Gateway service it should be excluded, until the above is fixed in the helm chart itself.
values.yaml
gateway:
service:
labels:
prometheus.io/service-monitor: "false"
In our case before the upgrade to v3 (chart: v5.20.0) we didn't have prometheus scraping of the gateway pods likely because the port names didn't match.
kind: ServiceMonitor
endpoints:
- port: http-metrics
path: /metrics
kind: Deployment
metadata:
name: loki-gateway
ports:
- name: http
After upgrading to v3 (v6.6.1) we got monitoring of gateway pods (the gateway pods got http-metrics port), but since we enabled auth on the gateway (basicAuth: enabled: true), prometheus scraping is getting 401 response.
server returned HTTP status 401 Unauthorized
http://10.1.5.228:8080/metrics
What is the best practice here? Is it possible to add an option to disable authentication only for metrics endpoint in the gateway-nginx via helm-chart? Or is adding auth credentials for prometheus scraping a preferred option here?
@akorp the issue is not auth, the issue is that /metrics is not handled, having auth enabled just fails the request with a 401 instead of 404.
This commit introduced the change seemingly as a drive-by: https://github.com/grafana/loki/commit/79b876b65d55c54f4d532e98dc24743dea8bedec#diff-d79225d50b6c12d41bceaed705a35fd5b5fff56f829fbbe5744ce6be632a0038
I think the port rename should be reverted. Until then @Pionerd's workaround is probably the best.
@DanielCastronovo How is this completed?
Still seems to be an issue here as well.
Worked-around using:
gateway:
service:
labels:
prometheus.io/service-monitor: "false"
Not completed still an issue. Please reopen.
Probably the closed it because they move their monitoring to this new even less complete meta monitoring chart.....
same issue.
Same. Please reopen.
I recently upgraded to v6.10.0 of the helm chart and experienced this same issue. I worked around it by deploying nginx-prometheus-exporter along side nginx in the loki-gateway deployment. This how I did it:
loki chart values snippet
gateway:
nginxConfig:
serverSnippet: |
location = /stub_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
location = /metrics {
proxy_pass http://127.0.0.1:9113/metrics;
}
extraContainers:
- name: nginx-exporter
securityContext:
allowPrivilegeEscalation: false
image: nginx/nginx-prometheus-exporter:1.3.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9113
name: http-exporter
resources:
limits:
memory: 128Mi
cpu: 500m
requests:
memory: 64Mi
cpu: 100m
I recently upgraded to v6.10.0 of the helm chart and experienced this same issue. I worked around it by deploying nginx-prometheus-exporter along side nginx in the loki-gateway deployment. This how I did it:
loki chart values snippet
Thanks for this, I too just ran into this with the chart upgrade.