prometheus-adapter icon indicating copy to clipboard operation
prometheus-adapter copied to clipboard

Certificate Validation Error in Prometheus Adapter Despite Valid Certificate

Open yashwanth-mannem opened this issue 2 years ago • 2 comments

What happened?:

During communication between the API server and the Prometheus adapter, a certificate validation error was observed. The error indicated that the certificate was expired. However, upon checking, the certificate in use was found to be valid and not expired.

What did you expect to happen?:

The Prometheus adapter should have successfully authenticated with the Kubernetes API server using the valid certificate, and metrics retrieval should have been successful.

Please provide the prometheus-adapter config:

prometheus-adapter config

apiVersion: v1 data: config.yaml: | resourceRules: cpu: containerLabel: container containerQuery: | sum by (<<.GroupBy>>) ( irate ( container_cpu_usage_seconds_total{<<.LabelMatchers>>,container!="",pod!=""}[120s] ) ) nodeQuery: | sum by (<<.GroupBy>>) ( 1 - irate( node_cpu_seconds_total{mode="idle"}[60s] ) ) or sum by (<<.GroupBy>>) ( node:windows_node_cpu_utilisation:avg5m{mode="idle",job="wmi-exporter",<<.LabelMatchers>>} ) resources: overrides: instance: resource: node namespace: resource: namespace pod: resource: pod memory: containerLabel: container containerQuery: | sum by (<<.GroupBy>>) ( container_memory_working_set_bytes{<<.LabelMatchers>>,container!="",pod!=""} ) nodeQuery: | sum by (<<.GroupBy>>) ( node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>} - node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>} ) or sum by (<<.GroupBy>>) ( node:windows_node_memory_utilization{job="wmi-exporter",<<.LabelMatchers>>} ) resources: overrides: instance: resource: node namespace: resource: namespace pod: resource: pod window: 5m kind: ConfigMap metadata: annotations: helm.fluxcd.io/antecedent: prometheus:helmrelease/prometheus-adapter meta.helm.sh/release-name: prometheus-adapter meta.helm.sh/release-namespace: prometheus creationTimestamp: "2023-06-21T23:04:00Z" labels: app: prometheus-adapter app.kubernetes.io/managed-by: Helm chart: prometheus-adapter-2.6.2 heritage: Helm release: prometheus-adapter name: prometheus-adapter namespace: prometheus

Please provide the HPA resource used for autoscaling:

HPA yaml

Not setup. We are noticing the issue, while executing k top nodes Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Please provide the HPA status:

NA

Please provide the prometheus-adapter logs with -v=6 around the time the issue happened:

prometheus-adapter logs

E0614 20:26:50.160879 1 authentication.go:53] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid: current time 2023-06-14T20:26:50Z is after 2022-09-20T21:38:27Z, verifying certificate SN=591688138426063623, SKID=, AKID= failed: x509: certificate has expired or is not yet valid: current time 2023-06-14T20:26:50Z is after 2022-09-20T21:38:27Z]

Anything else we need to know?:

We have ensured that the certificate is not expired and is located at the correct path as configured in the adapter. We have also verified the synchronization of the Secret/ConfigMap, certificate rotation process, time synchronization across nodes, and the validity of the certificate chain. The problem persists.

Environment: prometheus-adapter version: 2.6.2 prometheus version: v0.38.1 Kubernetes version (use kubectl version): v1.20.5" Cloud provider or hardware configuration: vpshere Other info: Verified allt he dependent manifests and resources; everything looks fine, but do not see metrics api service in the apiregistrations

yashwanth-mannem avatar Jun 26 '23 13:06 yashwanth-mannem

/kind support /remove-kind bug /triage accepted /assign

dgrisonnet avatar Jun 29 '23 16:06 dgrisonnet

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Jun 28 '24 17:06 k8s-triage-robot

@yashwanth-mannem sorry no one was able to get to this issue. If you are still experiencing the problem, feel free to re-open. /close

dashpole avatar Sep 05 '24 17:09 dashpole

@dashpole: Closing this issue.

In response to this:

@yashwanth-mannem sorry no one was able to get to this issue. If you are still experiencing the problem, feel free to re-open. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Sep 05 '24 17:09 k8s-ci-robot