kube-state-metrics Resource-scope metric endpoints

What would you like to be added: Currently kube-state-metrics offers a single endpoint to gather all metrics. Ideally, there would be a way to offer multiple endpoints or a filter on the endpoint to limit metrics.

Why is this needed: As a user I want to be able to scrape specific metrics at different intervals to reduce the resource usage and amount of metrics generated per scrape.
Describe the solution you'd like Either multiple endpoints could be introduced e.g. host:port/ingress/metrics or a filter host:port/metrics?filter=ingress could be introduced. The first option might be a bit easier to configure in Prometheus, the second option is more flexible if we ever want to allow advanced filtering (e.g. "only resources with this label").

This way users can define different scraping intervals and probably add some more user specific changes to a resource.

Additional context This allows more opportunities to reduce cost on kube-state-metrics as there's a more efficient way to only scrape a subset of metrics instead of ingesting all of them. https://github.com/kubernetes/kube-state-metrics#a-note-on-costing

Feb 21 '22 09:02 mrueg

/remove-label api-review /kind api-change

Mar 03 '22 14:03 liggitt

(relabeling, api-review indicates a design or PR is ready for API review)

Mar 03 '22 14:03 liggitt

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 01 '22 14:06 k8s-triage-robot

/remove-lifecycle stale

Jun 01 '22 15:06 fpetkovski

I dove into this and found out that KSM uses reflectors to populate and re-sync the metric store with metrics from the server's contents. Which is an interesting discovery by the way and personally helps me a lot to understand KSM more!

"reduce the resource usage"; Adding per-request based filtering wouldn't impact resource usage significantly since the metric store is kept up-to-date either way. This would only really impact the response size I think?

"amount of metrics generated per scrape"; Prometheus can drop metrics at scrape time, allowing the end-user to configure different scrape configurations with different scrape interval which scrape different metrics. This would most-likely benefit metrics with a high cardinality which you might want to scrape less often than others. This can be achieved without per-request based filtering.

//cc @mrueg @fpetkovski; what do you think? I might be missing something so please tell me if that's the case!

Jul 05 '22 20:07 Serializator

The way I understand it, KSM can generate very large responses which have to be fetched and parsed by Prometheus (or some other compatible scrape client). In large clusters, response sizes can be in the hundreds of megabytes.

Having client-side filtering of metrics would help increase scraping performance when a subset of metrics needs to be scraped.

Jul 06 '22 06:07 fpetkovski

Got it. I approached it from KSM's perspective and not from Prometheus's.

Jul 06 '22 15:07 Serializator

/assign

Sep 15 '22 05:09 rexagod

(@Serializator feel free to assign this to yourself if you're currently working on this)

Sep 15 '22 05:09 rexagod

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 08 '23 03:02 k8s-triage-robot

/remove-lifecycle stale /label lifecycle-frozen

Feb 08 '23 04:02 rexagod

@rexagod: The label(s) /label lifecycle-frozen cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/remove-lifecycle stale /label lifecycle-frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 08 '23 04:02 k8s-ci-robot

/lifecycle frozen

Feb 08 '23 04:02 rexagod

@mrueg I'm trying to establish the format the parameters will be passed in. By host:port/metrics?filter=ingress, do you mean host:port/metrics?filter=kind?

Would host:port/metrics?group=foo&kind=baz&filter=[metric_name] (version intentially left blank so it'd cover all of them, also metric_name is there to filter the same metrics for same GVK, in case of G** resolution) be better?

Apr 25 '23 04:04 rexagod

kube-state-metrics kube-state-metrics copied to clipboard

Resource-scope metric endpoints

kube-state-metrics
kube-state-metrics copied to clipboard