ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

nginx_ingress_controller_ssl_expire_time_seconds metrics went missing

Open nishant-ketu opened this issue 3 years ago • 13 comments

NGINX Ingress controller version: Kubernetes version : 1.23.0

  • Cloud provider or hardware configuration: GCP

  • OS (e.g. from /etc/os-release): COS

  • Install tools: HELM

  • How was the ingress-nginx-controller installed: HELM

What happened:

nginx_ingress_controller_ssl_expire_time_seconds is missing from metrics

What you expected to happen: nginx_ingress_controller_ssl_expire_time_seconds should be there

nishant-ketu avatar Feb 16 '22 10:02 nishant-ketu

@nishant-ketu: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 16 '22 10:02 k8s-ci-robot

@nishant-ketu can you share your instalation steps and how you confirmed nginx_ingress_controller_ssl_expire_time_seconds is missing?

kundan2707 avatar Feb 16 '22 16:02 kundan2707

/triage needs-information

kundan2707 avatar Feb 24 '22 10:02 kundan2707

@nishant-ketu friendly reminder

kundan2707 avatar Mar 03 '22 10:03 kundan2707

Hey @kundan2707 We are installing the nginx-ingres using helm on gke cluster. We are using below helm charts. https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx

How I confirmed that the metrics is missing is I went on to check the http:localhost/metrics in the nginx-ingress pods and nginx_ingress_controller_ssl_expire_time_seconds was missing. But strange part is that is for few hours and then it came back again. Resulting to which we also the data missing in prometheus and grafana.

Screenshot from 2022-03-06 08-58-46

nishant-ketu avatar Mar 06 '22 03:03 nishant-ketu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 04 '22 03:06 k8s-triage-robot

We hit the issue as well.

panxia6679 avatar Jun 06 '22 01:06 panxia6679

+1

mirkoszy avatar Jun 28 '22 15:06 mirkoszy

Same here too. @kundan2707, any update on this?

Ratan044 avatar Jun 30 '22 09:06 Ratan044

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 30 '22 09:07 k8s-triage-robot

Same issue.

Additional observations:

  • Other metrics are being emitted though (like nginx_ingress_controller_build_info). Maybe nginx lost leader? I think only leader emits ssl expiration info.
  • Rolling deployment of nginx controller helps, but this is not ideal solution ...

wimi avatar Aug 01 '22 12:08 wimi

This is really important for our monitoring; please reinstate the missing TLS lifetime metric.

UnrealCraig avatar Aug 08 '22 19:08 UnrealCraig

/remove-lifecycle rotten

UnrealCraig avatar Aug 08 '22 19:08 UnrealCraig

It looks like nginx_ingress_controller_ssl_expire_time_seconds is missing when curling metrics from the ingress-nginx-controller-metrics service but only on some calls whereas other calls have the metrics

$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3316k    0 3316k    0     0  18.8M      0 --:--:-- --:--:-- --:--:-- 18.9M
bash-5.1$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3076k    0 3076k    0     0  23.0M      0 --:--:-- --:--:-- --:--:-- 23.1M
bash-5.1$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3383k    0 3383k    0     0  22.8M      0 --:--:-- --:--:-- --:--:-- 22.9M
bash-5.1$ curl http://172.16.6.62:10254/metrics | grep nginx_ingress_controller_ssl_expire_time_seconds
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3081k    0 3081k    0     0  23.7M      0 --:--:-- --:--:-- --:--:-- 23.6M
# HELP nginx_ingress_controller_ssl_expire_time_seconds Number of seconds since 1970 to the SSL Certificate expire.\n			An example to check if this certificate will expire in 10 days is: "nginx_ingress_controller_ssl_expire_time_seconds < (time() + (10 * 24 * 3600))"
# TYPE nginx_ingress_controller_ssl_expire_time_seconds gauge
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="_",namespace="default"} 1.677700211e+09
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="earnestmercury.example.com",namespace="default"} 1.667357007e+09
nginx_ingress_controller_ssl_expire_time_seconds{class="k8s.io/ingress-nginx",host="novaaileron.example.com",namespace="default"} 1.669118467e+09
...

So somehow it seems the nginx_ingress_controller_ssl_expire_time_seconds metric is coming from a single pod in the deployment rather than all the pods

acjohnson avatar Aug 30 '22 14:08 acjohnson

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 28 '22 14:11 k8s-triage-robot

/remove-lifecycle stale

Rohlik avatar May 26 '23 11:05 Rohlik