descheduler Prometheus: TLS certificate does not contain IP SANs

What version of descheduler are you using?

descheduler version: v0.26.0

Does this issue reproduce with the latest release? yes!

Which descheduler CLI options are you using?

      - args:
        - --policy-config-file
        - /policy-dir/policy.yaml
        - --descheduling-interval
        - 5m
        - --v
        - "3"
        - --leader-elect=true
        - --leader-elect-lease-duration=15s
        - --leader-elect-renew-deadline=10s
        - --leader-elect-retry-period=2s
        - --leader-elect-resource-lock=leases
        - --leader-elect-resource-name=descheduler
        - --leader-elect-resource-namespace=kube-system
        command:
        - /bin/descheduler

Please provide a copy of your descheduler policy config file

N/A

What k8s version are you using (kubectl version)? v1.24.12

What did you do?

Let Prometheus scrape metrics from the descheduler.

What did you expect to see?

A TLS certificate that allows Prometheus to scrape metrics would be nice.

What did you see instead?

The descheduler creates a TLS certificate that does not include an IP address, causing Prometheus to refuse to scrape even when instructed to skip TLS verification:

Get "https://172.30.95.137:10258/metrics": x509: cannot validate certificate for 172.30.95.137 because it doesn't contain any IP SANs

Looks like adding the Pod IP address would be sufficient. However, I am aware that IP address auto-detection might be tricky if the Pod has multiple interfaces attached (e.g., with Multus). Solutions could be

Resolve own Pod IP address through kube-dns
Use the Kubernetes Downward API and a command line argument to pass the IP address

Mar 29 '23 09:03 stephan2012

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 27 '23 09:06 k8s-triage-robot

Still an open issue …

Jun 27 '23 09:06 stephan2012

/remove-lifecycle stale

Jun 27 '23 09:06 stephan2012

Just ran into this as well. Something that I was just wondering is why is the descheduler serving over HTTPS in the first place? Looks like there's some justification that the kuberentes project as a whole (kube api server, etc), has deprecated insecure connections, but as best I can tell, there's just the metrics endpoint that is being served, which is unlikely to be exposed publicly. Even if it is, all connections to it would fail as the provided TLS cert won't be valid...?

Is there a way to actually have prometheus scrape these metrics? How are people collecting these metrics?

It looks like based on https://github.com/kubernetes-sigs/descheduler/issues/842 and https://github.com/kubernetes-sigs/descheduler/issues/1095 , this issue has been raised, but there's still not a meaningful way to support Prometheus scraping. I don't necessarily disagree with HTTPS being required, but if there's some way to have Prometheus scrape the metrics, could that at least be documented?

Oct 12 '23 20:10 logyball

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 30 '24 06:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 29 '24 06:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 30 '24 06:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 30 '24 06:03 k8s-ci-robot

descheduler descheduler copied to clipboard

Prometheus: TLS certificate does not contain IP SANs

descheduler
descheduler copied to clipboard