kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

feat: Support list expansions

Open rexagod opened this issue 2 years ago • 25 comments

What this PR does / why we need it: Allow CRS configuration to take in path values for lists, denoted by "*". This means it's possible to specify [..., <list>, "*", foo, ...] in the configuration to dynamically generate multiple metrics that reflect different states of foo in various list elements.

How does this change affect the cardinality of KSM: No change.

rexagod avatar May 15 '23 23:05 rexagod

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar May 15 '23 23:05 k8s-ci-robot

A related https://github.com/kubernetes/kube-state-metrics/issues/1978

Current Custom State Resource API is a little complicated. Adding more features might bring more maintaining work and make it harder to migrate into cel.

I think cel can support this.

CatherineF-dev avatar Jun 14 '23 12:06 CatherineF-dev

I agree with Catherine, we shouldn't add more complexity to the existing config but rather focus on the new model.

dgrisonnet avatar Jun 14 '23 13:06 dgrisonnet

So, are we sunsetting the older model?

rexagod avatar Jun 14 '23 14:06 rexagod

Also, there was a thread regarding separating the CRS featureset into its own repository (the "non-turing" solution here being CEL-like languages, that do not offer an exhaustive way of defining the configuration, unlike go-based templating solutions).

rexagod avatar Jun 14 '23 14:06 rexagod

Reuse the wheel (cel, a query model) can reduce our maintaining work on how to extract fields. Then we can spend efforts on other things, instead of putting many efforts on implementing a query language.

CatherineF-dev avatar Jun 14 '23 14:06 CatherineF-dev

Will this mean that KSM's config capabilities will be restricted to what the underlying solution (#1978) has to offer?

rexagod avatar Jun 14 '23 14:06 rexagod

cel should be a complete language, which means it can extract every field.

cel is more powerful than current APIs and is capable of extracting more things.

CatherineF-dev avatar Jun 14 '23 14:06 CatherineF-dev

@dgrisonnet I'm assuming going forward with the newer model will mean we'll deprecate the older one. So, with that in mind, are PRs such as this and others like: #2067, #2052 obsolete and should be closed?

rexagod avatar Jun 14 '23 15:06 rexagod

I need to catch up on the discussions, a lot happened when I was away, but my general feeling is that we are starting to make the configuration way too complex. My fear is that this is going to become too confusing for the users and I am also worries about the current state of the parser. Without a proper lexer/parser combination this is slowly becoming unmaintainable.

IMO with the state of things, we should rather focus on building new foundations for the config rather than making the existing one more complex since it will just make the migration process harder. wdyt?

dgrisonnet avatar Jun 14 '23 15:06 dgrisonnet

I couldn't agree more.

The current CRS featureset, in addition to being complex and hardly maintainable for developers, doesn't have a clear, semantical, or easy-to-understand UX at the moment. Off the top of my mind, I can think of a few examples that go on to show a couple of such instances. I personally would not at all expect the user to be familiar with the following intricacies.

  • Relative paths can only be defined once, and the fact that technically there's two labelsFromPath possible but the other one does not support it.

          each:
            type: Info
            info:
              path: [spec, selector, labels]
              labelsFromPath:
                "*": [matchLabels]
          labelsFromPath:
            name: [metadata, name]
    
  • The outer lablesFromPath operates on the subset that satified the inner labelsFromPath. In the configuration snippet below, the outer labelsFromPath will only generate name labels for all objects that satisfied the inner labelsFromPath, i.e., a non-nil foo label.

    	  each:
            type: Info
            info:
              path: [metadata]
              labelsFromPath:
                "foo_label": [labels, foo]
          labelsFromPath:
            name: [metadata, name]
    
  • The difference in each of the configurations below, which looks more like a side-effect of another change.

      each:
            type: Info
            info:
              path: [users]
              labelsFromPath:
                user: []
    

    outputs: kube_customresource_metric_name{..., user="user1"} 1 and kube_customresource_metric_name{..., user="user2"} 1 while,

      each:
            type: Info
            info:
              labelsFromPath:
                user: [users]
    

    outputs: kube_customresource_metric_name{..., user="[user1 user2]"} 1.

TBH only reason I raised the aforementioned PRs was because I wasn't sure about the direction in which we were going to move forward over the last month regarding this, and assumed that doing so would entail discussions spanning a good while, so things would basically be done the same way as they were till now.

But, I'm enthusiastic and happy to see that there's a consensus on improving the current approach, and the discussions are yeilding actionable results. It'd be nice to see the proposed solution replacing as many configuration constructs as possible, in order to achieve a more accessible and deterministic UX for the user, thanks to @CatherineF-dev. 🎉

rexagod avatar Jun 14 '23 16:06 rexagod

While it's great to see we've got the ball rolling on the CEL implementation, I think in the meantime, it makes sense to continue supporting and improving the current configuration constructs until that is in place and we start with the deprecation cycle for the current constructs that are covered by it.

rexagod avatar Jun 20 '23 12:06 rexagod

While it's great to see we've got the ball rolling on the CEL implementation, I think in the meantime, it makes sense to continue supporting and improving the current configuration constructs until that is in place and we start with the deprecation cycle for the current constructs that are covered by it.

I disagree, depending on what we introduce it may become increasingly difficult to deprecate or replace functionality. Let's figure out the plan and then move forward, in the same direction, together.

logicalhan avatar Jun 20 '23 20:06 logicalhan

/triage accepted /assign

dgrisonnet avatar Jun 29 '23 16:06 dgrisonnet

(pulled in https://github.com/kubernetes/kube-state-metrics/pull/2052 to combine all patches for the dynamic valueFrom)

rexagod avatar Aug 29 '23 20:08 rexagod

/hold Let me improve on this a bit (we should be able to replace the x_*: [y], valueFrom: [y_*] with x_*: [y], valueFrom: [x_*]).

rexagod avatar Aug 30 '23 08:08 rexagod

/hold cancel Up for reviews.

rexagod avatar Aug 30 '23 10:08 rexagod

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 22 '24 12:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 21 '24 12:02 k8s-triage-robot

/remove-lifecycle rotten

rexagod avatar Feb 21 '24 18:02 rexagod

FYI This is up for reviews (or merging, if this looks good).

rexagod avatar Mar 03 '24 16:03 rexagod

We're interested in this feature and hope to see it in the next release soon!

eugenepaniot avatar Mar 20 '24 10:03 eugenepaniot

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 04 '24 23:04 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 04 '24 00:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 03 '24 00:08 k8s-triage-robot