kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Resource-scope metric endpoints

Open mrueg opened this issue 3 years ago • 9 comments

What would you like to be added: Currently kube-state-metrics offers a single endpoint to gather all metrics. Ideally, there would be a way to offer multiple endpoints or a filter on the endpoint to limit metrics.

Why is this needed: As a user I want to be able to scrape specific metrics at different intervals to reduce the resource usage and amount of metrics generated per scrape.
Describe the solution you'd like Either multiple endpoints could be introduced e.g. host:port/ingress/metrics or a filter host:port/metrics?filter=ingress could be introduced. The first option might be a bit easier to configure in Prometheus, the second option is more flexible if we ever want to allow advanced filtering (e.g. "only resources with this label").

This way users can define different scraping intervals and probably add some more user specific changes to a resource.

Additional context This allows more opportunities to reduce cost on kube-state-metrics as there's a more efficient way to only scrape a subset of metrics instead of ingesting all of them. https://github.com/kubernetes/kube-state-metrics#a-note-on-costing

mrueg avatar Feb 21 '22 09:02 mrueg

/remove-label api-review /kind api-change

liggitt avatar Mar 03 '22 14:03 liggitt

(relabeling, api-review indicates a design or PR is ready for API review)

liggitt avatar Mar 03 '22 14:03 liggitt

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 01 '22 14:06 k8s-triage-robot

/remove-lifecycle stale

fpetkovski avatar Jun 01 '22 15:06 fpetkovski

I dove into this and found out that KSM uses reflectors to populate and re-sync the metric store with metrics from the server's contents. Which is an interesting discovery by the way and personally helps me a lot to understand KSM more!

"reduce the resource usage"; Adding per-request based filtering wouldn't impact resource usage significantly since the metric store is kept up-to-date either way. This would only really impact the response size I think?

"amount of metrics generated per scrape"; Prometheus can drop metrics at scrape time, allowing the end-user to configure different scrape configurations with different scrape interval which scrape different metrics. This would most-likely benefit metrics with a high cardinality which you might want to scrape less often than others. This can be achieved without per-request based filtering.

//cc @mrueg @fpetkovski; what do you think? I might be missing something so please tell me if that's the case!

Serializator avatar Jul 05 '22 20:07 Serializator

The way I understand it, KSM can generate very large responses which have to be fetched and parsed by Prometheus (or some other compatible scrape client). In large clusters, response sizes can be in the hundreds of megabytes.

Having client-side filtering of metrics would help increase scraping performance when a subset of metrics needs to be scraped.

fpetkovski avatar Jul 06 '22 06:07 fpetkovski

Got it. I approached it from KSM's perspective and not from Prometheus's.

Serializator avatar Jul 06 '22 15:07 Serializator

/assign

rexagod avatar Sep 15 '22 05:09 rexagod

(@Serializator feel free to assign this to yourself if you're currently working on this)

rexagod avatar Sep 15 '22 05:09 rexagod

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 08 '23 03:02 k8s-triage-robot

/remove-lifecycle stale /label lifecycle-frozen

rexagod avatar Feb 08 '23 04:02 rexagod

@rexagod: The label(s) /label lifecycle-frozen cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-lifecycle stale /label lifecycle-frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 08 '23 04:02 k8s-ci-robot

/lifecycle frozen

rexagod avatar Feb 08 '23 04:02 rexagod

@mrueg I'm trying to establish the format the parameters will be passed in. By host:port/metrics?filter=ingress, do you mean host:port/metrics?filter=kind?

Would host:port/metrics?group=foo&kind=baz&filter=[metric_name] (version intentially left blank so it'd cover all of them, also metric_name is there to filter the same metrics for same GVK, in case of G** resolution) be better?

rexagod avatar Apr 25 '23 04:04 rexagod