nginx-prometheus-exporter
nginx-prometheus-exporter copied to clipboard
HTTP server randomly closes, offers vague reason, refuses to elaborate further
Describe the bug I'm running nginx-prometheus-exporter as a container sitting next to my nginx container. The container with NPE randomly dies for no reason. It just logs:
{"time": actual time,"level":"INFO","source":"exporter.go:217","msg":"shutting down"}
{"time": actual time,"level":"INFO","source":"exporter.go:208","msg":"HTTP server closed","error":"http: server closed"}
Despite the fact that NPE was started with --log.level=debug, there's really nothing elaborated on why the HTTP server shut down.
To reproduce Steps to reproduce the behavior:
- Deploy NPE with
--log.level=debug. - Wait 4 minutes, or 4 hours, or 12 hours.
Expected behavior NPE should explain why it shut down so that I actually fix it.
Your environment
- Version of the Prometheus exporter - 1.4.1
- Version of Docker/Kubernetes - not relevant
- [if applicable] Kubernetes platform (e.g. Mini-kube or GCP): Mirantis
- Using NGINX or NGINX Plus: NGINX
Hi @grepwood! Welcome to the project! 🎉
Thanks for opening this issue! Be sure to check out our Contributing Guidelines and the Issue Lifecycle while you wait for someone on the team to take a look at this.
I'm facing the exact same issue for a couple of weeks now. I'm also running it in Kubernetes with 3 pods, in a Production environment. Here are the data I managed to collect.
What I managed to observe is the following by logging in the Nginx container's shell.
Even though the Nginx /stub_status endpoint responds normally with the metrics,
I'm not able to get a reply from the Nginx Prometheus Exporter container. The request just hangs indefinitely.
(I interrupted the execution after more than 1 min.)
/ $ date && time curl -v 127.0.0.1:8000/stub_status && echo && echo && echo && date && time curl -v 127.0.0.1:9113/metrics; date
Thu May 8 21:03:24 UTC 2025
* Trying 127.0.0.1:8000...
* Connected to 127.0.0.1 (127.0.0.1) port 8000
* using HTTP/1.x
> GET /stub_status HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
< Server: nginx/1.26.3
< Date: Thu, 08 May 2025 21:03:24 GMT
< Content-Type: text/plain
< Content-Length: 107
< Connection: keep-alive
<
Active connections: 36
server accepts handled requests
313 313 14525
Reading: 0 Writing: 6 Waiting: 30
* Connection #0 to host 127.0.0.1 left intact
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
Thu May 8 21:03:24 UTC 2025
* Trying 127.0.0.1:9113...
* Connected to 127.0.0.1 (127.0.0.1) port 9113
* using HTTP/1.x
> GET /metrics HTTP/1.1
> Host: 127.0.0.1:9113
> User-Agent: curl/8.12.1
> Accept: */*
>
* Request completely sent off
^C Command terminated by signal 2
real 1m 24.84s
user 0m 0.00s
sys 0m 0.00s
Thu May 8 21:04:49 UTC 2025
Logs from the containers
1746735622753 time=2025-05-08T20:20:22.753Z level=INFO source=exporter.go:217 msg="shutting down"
1746735622753 time=2025-05-08T20:20:22.753Z level=INFO source=exporter.go:208 msg="HTTP server closed" error="http: Server closed"
1746735622872 time=2025-05-08T20:20:22.872Z level=INFO source=exporter.go:123 msg=nginx-prometheus-exporter version="(version=1.4.2, branch=HEAD, revision=ced6fda825f88077debfacab8d82536ce502bb17)"
1746735622872 time=2025-05-08T20:20:22.872Z level=INFO source=exporter.go:124 msg="build context" build_context="(go=go1.24.2, platform=linux/amd64, user=goreleaser, date=2025-04-28T15:24:56Z, tags=unknown)"
1746735622875 time=2025-05-08T20:20:22.875Z level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9113
1746735622875 time=2025-05-08T20:20:22.875Z level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9113
1746735828041 time=2025-05-08T20:23:48.041Z level=INFO source=exporter.go:217 msg="shutting down"
1746735828041 time=2025-05-08T20:23:48.041Z level=INFO source=exporter.go:208 msg="HTTP server closed" error="http: Server closed"
1746735828155 time=2025-05-08T20:23:48.155Z level=INFO source=exporter.go:123 msg=nginx-prometheus-exporter version="(version=1.4.2, branch=HEAD, revision=ced6fda825f88077debfacab8d82536ce502bb17)"
1746735828155 time=2025-05-08T20:23:48.155Z level=INFO source=exporter.go:124 msg="build context" build_context="(go=go1.24.2, platform=linux/amd64, user=goreleaser, date=2025-04-28T15:24:56Z, tags=unknown)"
1746735828158 time=2025-05-08T20:23:48.158Z level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9113
1746735828158 time=2025-05-08T20:23:48.158Z level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9113
1746735848029 time=2025-05-08T20:24:08.029Z level=INFO source=exporter.go:217 msg="shutting down"
1746735848029 time=2025-05-08T20:24:08.029Z level=INFO source=exporter.go:208 msg="HTTP server closed" error="http: Server closed"
1746735848194 time=2025-05-08T20:24:08.194Z level=INFO source=exporter.go:123 msg=nginx-prometheus-exporter version="(version=1.4.2, branch=HEAD, revision=ced6fda825f88077debfacab8d82536ce502bb17)"
1746735848194 time=2025-05-08T20:24:08.194Z level=INFO source=exporter.go:124 msg="build context" build_context="(go=go1.24.2, platform=linux/amd64, user=goreleaser, date=2025-04-28T15:24:56Z, tags=unknown)"
1746735848198 time=2025-05-08T20:24:08.197Z level=INFO source=tls_config.go:347 msg="Listening on" address=[::]:9113
1746735848198 time=2025-05-08T20:24:08.197Z level=INFO source=tls_config.go:350 msg="TLS is disabled." http2=false address=[::]:9113
Events
Events:
Warning Unhealthy 20m kubelet Readiness probe failed: Get "http://240.48.0.181:9113/metrics": EOF
Normal Pulled 20m (x2 over 22m) kubelet Container image "nginx/nginx-prometheus-exporter:1.4.2" already present on machine
Normal Started 20m (x2 over 22m) kubelet Started container nginx-prometheus-exporter
Normal Killing 20m kubelet Container nginx-prometheus-exporter failed liveness probe, will be restarted
Normal Created 20m (x2 over 22m) kubelet Created container: nginx-prometheus-exporter
Warning Unhealthy 77s (x7 over 20m) kubelet Liveness probe failed: Get "http://240.48.0.181:9113/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 67s (x8 over 20m) kubelet Readiness probe failed: Get "http://240.48.0.181:9113/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Events:
Normal Started 19m (x2 over 22m) kubelet Started container nginx-prometheus-exporter
Normal Created 19m (x2 over 22m) kubelet Created container: nginx-prometheus-exporter
Normal Pulled 19m (x2 over 22m) kubelet Container image "nginx/nginx-prometheus-exporter:1.4.2" already present on machine
Normal Killing 19m kubelet Container nginx-prometheus-exporter failed liveness probe, will be restarted
Warning Unhealthy 19m kubelet Readiness probe failed: Get "http://240.48.0.73:9113/metrics": EOF
Warning Unhealthy 8s (x10 over 19m) kubelet Readiness probe failed: Get "http://240.48.0.73:9113/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 8s (x10 over 19m) kubelet Liveness probe failed: Get "http://240.48.0.73:9113/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Events:
Normal Started 20m (x2 over 22m) kubelet Started container nginx-prometheus-exporter
Normal Created 20m (x2 over 22m) kubelet Created container: nginx-prometheus-exporter
Normal Pulled 20m (x2 over 22m) kubelet Container image "nginx/nginx-prometheus-exporter:1.4.2" already present on machine
Normal Killing 20m kubelet Container nginx-prometheus-exporter failed liveness probe, will be restarted
Warning Unhealthy 20m kubelet Readiness probe failed: Get "http://240.48.0.60:9113/metrics": EOF
Warning Unhealthy 83s (x7 over 20m) kubelet Readiness probe failed: Get "http://240.48.0.60:9113/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 78s (x8 over 20m) kubelet Liveness probe failed: Get "http://240.48.0.60:9113/metrics": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Configuration
args:
- --nginx.scrape-uri=http://127.0.0.1:8000/stub_status
- --log.level=debug
image: nginx/nginx-prometheus-exporter:1.4.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9113
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
name: nginx-prometheus-exporter
ports:
- containerPort: 9113
name: nginx-metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9113
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
memory: 64Mi
requests:
cpu: 100m
memory: 64Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-4pnp9
readOnly: true
In the meanwhile, we lose the metrics in Grafana.
Hey! Anyone managed to take a look at this? Or found out a solution? 👀
Hi @diogokiss !
I apologise for not getting back to you earlier. Have you found a solution to this since?
If not, it looks like the liveness and readiness probes fail when kubernetes tries to check on prometheus exporter, and the pods get recycled. It also looks like the nginx binary is in the same pod (scrape url is 127.0.0.1).
Would you be able to give us more information around the pod, the ports, whether nginx and prometheus exporter are in the same pod, is PE used as a sidecar?
Can you manually ping the /metrics endpoint on port 9113?
You also wrote
Even though the Nginx /stub_status endpoint responds normally with the metrics,
Would you be able to tell us how you checked? Was it from outside the pod / kubernetes, or from within the pod / kubernetes?
If you have any other info to add, let us know! Thank you for your patience!
Hi @javorszky,
While not the person you asked, I get this in the exact conditions you specify:
It also looks like the nginx binary is in the same pod (scrape url is 127.0.0.1).
Yes. Same pod, different container.
Can you manually ping the /metrics endpoint on port 9113? Would you be able to tell us how you checked? Was it from outside the pod / kubernetes, or from within the pod / kubernetes?
Within the same pod, from any container in that pod. Outside of that, not at all because I deliberately configured nginx to not let anyone else than 127.0.0.1 check /stub_status.