kube-state-metrics Automate generation of Custom Resource configuration

What would you like to be added:

As a developer, I would like the kube-state-metrics Custom Resource configuration to be automatically generated from the API code.

Thanks to @sbueringer and @fabriziopandini for helping in driving this efforts 👍

Why is this needed:

kube-state-metrics requires a configuration file for Custom Resources which could grow to a very large file and may be hard to maintain over long-term together with changing Custom Resource definitions.

Instead of writing the configuration file manually it would be awesome if the configuration could get generated from the code where the custom resource gets defined instead.

Describe the solution you'd like

Prior art:

Kubebuilder makes use of a tool called controller-gen to generate the yaml's for e.g. Custom Resource definitions from code. To do that it makes heavy use of markers inside the go code, which are comments in the form of // +foo:bar:key1=value1,key2=value2.

Similar to controller-gen, kube-state-metrics could provide an additional tool to generate the custom resource metrics configuration file from pre-defined markers at the go code level.

Additional context

We already thought about adding a tool like this at the Cluster API project, but we think it would be a better fit for the kube-state-metrics project, as it could help authors of any custom resource and is not specific to Cluster API.

We also have an idea how the markers / design could look like, which builds on top of the currently manually written configuration in Cluster API: https://github.com/kubernetes-sigs/cluster-api/issues/7158#issuecomment-1317701277 . I could also migrate the first implementation design idea over to this issue or however it would be decided to proceed.

If the maintainers of kube-state-metrics also think that the described idea would be worth to implement: I'd be of course happy to volunteer to help or contribute for this efforts.

Small example:

// +Metrics:namePrefix=capi_cluster
// +Metrics:labelFromPath:name=name,JSONPath=".metadata.name"
// +Metrics:labelFromPath:name=namespace,JSONPath=".metadata.namespace"
// +Metrics:labelFromPath:name=uid,JSONPath=".metadata.uid"
type Cluster struct {
	metav1.ObjectMeta `json:"metadata,omitempty"`
	...
	Spec ClusterSpec `json:"spec"` 
}

type ClusterSpec struct {
	// +Metrics:gauge:name="spec_paused",nilIsZero=true,help="Whether the cluster is paused and any of its resources will not be processed by the controllers."
	Paused bool `json:"paused,omitempty"`
}

Could result in the following configuration file

kind: CustomResourceStateMetrics
spec:
  resources:
  - groupVersionKind:
      group: cluster.x-k8s.io
      kind: Cluster
      version: v1beta1
    labelsFromPath:
      name:
      - metadata
      - name
      namespace:
      - metadata
      - namespace
      uid:
      - metadata
      - uid
    metricNamePrefix: capi_cluster
    metrics:
    - name: spec_paused
      help: Whether the cluster is paused and any of its resources will not be processed by the controllers.
      each:
        gauge:
          nilIsZero: true
          path:
          - spec
          - paused
        type: Gauge

Nov 17 '22 16:11 chrischdi

This is a good idea to generate configurations.

However, does it require changing codes to add these annotations? Another idea might be generating configurations using k8s client.

Nov 18 '22 13:11 CatherineF-dev

However, does it require changing codes to add these annotations?

Yes. This would follow the pattern that controller-runtime/controller-tools uses to generate CRDs (see: https://book.kubebuilder.io/reference/markers/crd.html). Essentially we would add markers for metrics in addition to the ones we already have for the CRD generation.

Another idea might be generating configurations using k8s client.

How would this work? (which k8s client do you mean?)

Nov 21 '22 04:11 sbueringer

However, does it require changing codes to add these annotations?

This will be an opt-in method alternative to writing this file manually; and as described above, it follow a well established pattern in Kubernetes API development (some of those annotations originate from Kuberentes API itself, kubebuilder added more on top)

Nov 21 '22 09:11 fabriziopandini

This will be an opt-in method alternative to writing this file manually.

Got it. Maybe we can keep kube-state-metrics API (crd-config.yaml) as same before, and this tool helps generating/merging CRD configurations?

The tricky cases I considered before are:

want annotation-based CRD metrics but it's hard to change codes
don't want annotation-based CRD metrics but it's hard to change codes

For kuberenetes/kubernetes codes, it needs rebuilding binary after changing codes. For other oss components, sometimes we just use it and want to monitor it without changing codes.

Nov 21 '22 13:11 CatherineF-dev

Maybe we can keep kube-state-metrics API (crd-config.yaml) as same before, and this tool helps generating/merging CRD configurations?

Yup absolutely, that is the idea :)

I think for the cases where the code cannot be adjusted it's probably the easiest to write the metrics configuration manually.

What we try to solve is essentially that when you are in control of the CRDs and corresponding Go types you can directly mark the metrics on the fields and then you don't have to write the config. But I think if you can't modify the CRD's go types it's hard to find an easier way to get the the config then just writing the config manually.

Nov 21 '22 13:11 sbueringer

essentially that when you are in control of the CRDs and corresponding Go types you can directly mark the metrics on the fields and then you don't have to write the config.

Agree.

Once this tool becomes popular in the future, we need to consider the case that don't want annotation-based CRD metrics but it's hard to change codes. For examples, a lot of OSS CRDs have these annotations. kube-state-metrics should have the ability to control which metric is collected, instead of basing on CRD annotations.

Nov 21 '22 13:11 CatherineF-dev

agreed this is a tool and it just increases the number of available options for kube-state-metrics users 1 - manually write crd-config.yaml (as of today), pass it to kube-state-metrics 2 - annotate CRD, use the tool to generate crd-config.yaml, pass it to kube-state-metrics 3 - annotate CRD, use the tool to generate crd-config.yaml + manually make some adjustement, pass it to kube-state-metrics 4 - probably more

but from a kube-state-metrics nothing will change, everything will start from one (or potentially more) crd-config.yaml

Nov 21 '22 13:11 fabriziopandini

We would like to use kube-state-metrics with crossplane where we create dozens of CRD's

It would be great if kube-state-metrics could provide CustomResourceStateMetrics CRD. Whenever object of that kind will be created it will generate new crd-cofig.yaml and reload metrics.

With such CRD we can define for our every crossplane component CustomResourceStateMetrics configuration saying how to produce metrics

Jan 09 '23 09:01 mateusz-lubanski-sinch

I would also love to have a CRD for defining the configuration. However IMHO this should be seen as a separate topic and I'd prefer to use a separate issue to track this. This kind of feature was already mentioned multiple times in slack too 🙂

We should keep the issue here scoped to "generating the input" for kube-state-metrics.

If the "read configuration from CRD" feature exists: the here proposed generator should get adjusted/improved to (also?) allow creating the CR's.

Edit: I will go forward and create a separate issue for this :-) Edit2: link to seperate issue:

https://github.com/kubernetes/kube-state-metrics/issues/1948

Jan 09 '23 10:01 chrischdi

I'd like to propose the following UX for the generator (kudo's to @sbueringer and @fabriziopandini which helped brainstorming and compiling this).

We did take the metrics at Cluster API and went through the example metrics to try to catch all use-cases.

Note:

all markers must have a prefix which is still TBD, we’re using Metrics for now (it could also be something like ksm or kube-state-metrics)
all markers must comply with https://book.kubebuilder.io/reference/markers.html#marker-syntax

Metrics:namePrefix

// +Metrics:namePrefix=<string> on API type struct

Defines the metricNamePrefix for all metrics derived from the struct the markers apply to.

e.g.

// +Metrics:namePrefix=capi_cluster
type Cluster struct { ... }

Metrics:labelFromPath

// +Metrics:labelFromPath:name=<string>,JSONPath=<string> on API type struct

Defines a label that applies to all metrics derived from the struct the markers apply to.

e.g.

// +Metrics:labelFromPath:name=name,JSONPath=".metadata.name"
// +Metrics:labelFromPath:name=namespace,JSONPath=".metadata.namespace"
// +Metrics:labelFromPath:name=uid,JSONPath=".metadata.uid"
type Cluster struct { ... }

Metrics:gauge

// +Metrics:gauge:name=<string>,help=<string>,nilIsZero=<bool>,JSONPath:<string>,labelFromPath={map[<string>]<string>} on field

When applied to an API field it creates a metric of type gauge for the field.

name=<string> the name of the metric
help=<string> the help string of the metric
nilIsZero=<bool> optional; force the metric to count nil values as zero
JSONPath:<string> optional; in case the field is a complex type, this allows creating metrics for nested fields given their path
labelsFromPath={map[<string>]<string>} optional; allows adding labels whose values are read from the given path (. can be used as current path)

e.g.

// +Metrics:gauge:name="spec_paused",help="Whether the cluster is paused and any of its resources will not be processed by the controllers.",nilIsZero=true
Paused bool `json:"paused,omitempty"`

// +Metrics:gauge:name="created",JSONPath='.creationTimestamp",help="Unix creation timestamp."
metav1.ObjectMeta `json:"metadata,omitempty"`

Metrics:stateset

// + Metrics:stateset:name=<string>, help=<string>, labelName=<string>, list=[]<string>, JSONPath:<string>, labelFromPath={map[<string>]<string>} on field

When applied to an API field it creates a metric of type statest for the field.

name=<string> the name of the metric
help=<string> the help string of the metric
labelName=<string> the name of the label for the stateset
list=[]<string> the list of values for the stateset
JSONPath:<string> optional; in case the field is a complex type, this allows creating metrics for nested fields given their path
labelsFromPath={map[<string>]<string>} optional; allows adding labels whose values are read from the given path (. can be used as current path)

e.g.

// +Metrics:stateset:name="status_phase", help="The clusters current phase.", labelName=phase, list{"Pending", "Provisioning", "Provisioned", "Deleting", "Failed", "Unknown"}
Phase string `json:"phase,omitempty"`

// +Metrics:stateset:name="status_condition", help="The condition of a cluster.", labelName="status", JSONPath: ".status", list{"True", "False", "Unknown"}, labelsFromPath={type: ".type"}
Conditions Conditions `json:"conditions,omitempty"`

Metrics:info

// +Metrics:info:name=<string>,help=<string>,JSONPath:<string>, labelFromPath={map[<string>]<string>} on field or struct

When applied to an API field it creates a metric of type info for the field.

name=<string> the name of the metric
help=<string> the help string of the metric
JSONPath:<string> optional when the marker is applied to a field, required when the marker applies to a struct; this allows creating metrics for nested fields given their path
labelsFromPath={map[<string>]<string>} optional; allows adding labels whose values are read from the given path (. can be used as current path)

e.g.

// +Metrics:info:name="info",help="Information about a cluster.",labelsFromPath={topology_version: ".spec.topology.version", topology_class: ".spec.topology.class"}
// Cluster is the Schema for the clusters API.
type Cluster struct {

// +Metrics:info:name="annotation_paused", JSONPath='.annotations.['cluster\\.x-k8s.io/paused']", help="Whether the cluster is paused and any of its resources will not be processed by the controllers.", labelsFromPath={paused_value: "."}
metav1.ObjectMeta `json:"metadata,omitempty"`

}

Jan 09 '23 11:01 chrischdi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 09 '23 11:04 k8s-triage-robot

/remove-lifecycle stale

Apr 09 '23 12:04 mrueg

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 08 '23 13:07 k8s-triage-robot

/remove-lifecycle stale

Jul 08 '23 21:07 chrischdi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 24 '24 02:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 23 '24 03:02 k8s-triage-robot

/remove-lifecycle rotten AFAIK there is a PR for this

Feb 23 '24 16:02 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 23 '24 17:05 k8s-triage-robot

/remove-lifecycle stale

May 27 '24 07:05 chrischdi