ingress-nginx
ingress-nginx copied to clipboard
nginx_ingress_controller_ssl_expire_time_seconds metrics went missing
NGINX Ingress controller version: Kubernetes version : 1.23.0
-
Cloud provider or hardware configuration: GCP
-
OS (e.g. from /etc/os-release): COS
-
Install tools: HELM
-
How was the ingress-nginx-controller installed: HELM
What happened:
nginx_ingress_controller_ssl_expire_time_seconds is missing from metrics
What you expected to happen: nginx_ingress_controller_ssl_expire_time_seconds should be there
@nishant-ketu: This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@nishant-ketu can you share your instalation steps and how you confirmed nginx_ingress_controller_ssl_expire_time_seconds is missing?
/triage needs-information
@nishant-ketu friendly reminder
Hey @kundan2707 We are installing the nginx-ingres using helm on gke cluster. We are using below helm charts. https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx
How I confirmed that the metrics is missing is I went on to check the http:localhost/metrics in the nginx-ingress pods and nginx_ingress_controller_ssl_expire_time_seconds was missing. But strange part is that is for few hours and then it came back again. Resulting to which we also the data missing in prometheus and grafana.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
We hit the issue as well.
+1
Same here too. @kundan2707, any update on this?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Same issue.
Additional observations:
- Other metrics are being emitted though (like nginx_ingress_controller_build_info). Maybe nginx lost leader? I think only leader emits ssl expiration info.
- Rolling deployment of nginx controller helps, but this is not ideal solution ...
This is really important for our monitoring; please reinstate the missing TLS lifetime metric.
/remove-lifecycle rotten
It looks like nginx_ingress_controller_ssl_expire_time_seconds
is missing when curling metrics from the ingress-nginx-controller-metrics
service but only on some calls whereas other calls have the metrics
$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3316k 0 3316k 0 0 18.8M 0 --:--:-- --:--:-- --:--:-- 18.9M
bash-5.1$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3076k 0 3076k 0 0 23.0M 0 --:--:-- --:--:-- --:--:-- 23.1M
bash-5.1$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3383k 0 3383k 0 0 22.8M 0 --:--:-- --:--:-- --:--:-- 22.9M
bash-5.1$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3081k 0 3081k 0 0 23.7M 0 --:--:-- --:--:-- --:--:-- 23.6M
# HELP nginx_ingress_controller_ssl_expire_time_seconds Number of seconds since 1970 to the SSL Certificate expire.\n An example to check if this certificate will expire in 10 days is: "nginx_ingress_controller_ssl_expire_time_seconds < (time() + (10 * 24 * 3600))"
# TYPE nginx_ingress_controller_ssl_expire_time_seconds gauge
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="_",namespace="default"} 1.677700211e+09
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="earnestmercury.example.com",namespace="default"} 1.667357007e+09
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="novaaileron.example.com",namespace="default"} 1.669118467e+09
...
So somehow it seems the nginx_ingress_controller_ssl_expire_time_seconds
metric is coming from a single pod in the deployment rather than all the pods
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale