descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

Prometheus: TLS certificate does not contain IP SANs

Open stephan2012 opened this issue 2 years ago • 6 comments

What version of descheduler are you using?

descheduler version: v0.26.0

Does this issue reproduce with the latest release? yes!

Which descheduler CLI options are you using?

      - args:
        - --policy-config-file
        - /policy-dir/policy.yaml
        - --descheduling-interval
        - 5m
        - --v
        - "3"
        - --leader-elect=true
        - --leader-elect-lease-duration=15s
        - --leader-elect-renew-deadline=10s
        - --leader-elect-retry-period=2s
        - --leader-elect-resource-lock=leases
        - --leader-elect-resource-name=descheduler
        - --leader-elect-resource-namespace=kube-system
        command:
        - /bin/descheduler

Please provide a copy of your descheduler policy config file

N/A

What k8s version are you using (kubectl version)? v1.24.12

What did you do?

Let Prometheus scrape metrics from the descheduler.

What did you expect to see?

A TLS certificate that allows Prometheus to scrape metrics would be nice.

What did you see instead?

The descheduler creates a TLS certificate that does not include an IP address, causing Prometheus to refuse to scrape even when instructed to skip TLS verification:

Get "https://172.30.95.137:10258/metrics": x509: cannot validate certificate for 172.30.95.137 because it doesn't contain any IP SANs

Looks like adding the Pod IP address would be sufficient. However, I am aware that IP address auto-detection might be tricky if the Pod has multiple interfaces attached (e.g., with Multus). Solutions could be

  • Resolve own Pod IP address through kube-dns
  • Use the Kubernetes Downward API and a command line argument to pass the IP address

stephan2012 avatar Mar 29 '23 09:03 stephan2012

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 27 '23 09:06 k8s-triage-robot

Still an open issue …

stephan2012 avatar Jun 27 '23 09:06 stephan2012

/remove-lifecycle stale

stephan2012 avatar Jun 27 '23 09:06 stephan2012

Just ran into this as well. Something that I was just wondering is why is the descheduler serving over HTTPS in the first place? Looks like there's some justification that the kuberentes project as a whole (kube api server, etc), has deprecated insecure connections, but as best I can tell, there's just the metrics endpoint that is being served, which is unlikely to be exposed publicly. Even if it is, all connections to it would fail as the provided TLS cert won't be valid...?

Is there a way to actually have prometheus scrape these metrics? How are people collecting these metrics?

It looks like based on https://github.com/kubernetes-sigs/descheduler/issues/842 and https://github.com/kubernetes-sigs/descheduler/issues/1095 , this issue has been raised, but there's still not a meaningful way to support Prometheus scraping. I don't necessarily disagree with HTTPS being required, but if there's some way to have Prometheus scrape the metrics, could that at least be documented?

logyball avatar Oct 12 '23 20:10 logyball

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 30 '24 06:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 29 '24 06:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 30 '24 06:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 30 '24 06:03 k8s-ci-robot