kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

repeated adding and deleting CustomResourceDefinitions causes duplicate metric entries

Open k15r opened this issue 1 year ago • 11 comments

What happened:

This is part of our kubestate customresource configuration:

       - errorLogV: 0
         groupVersionKind:
           group: operator.kyma-project.io
           kind: Keda
           version: '*'
         metrics:
         - each:
             stateSet:
               labelName: state
               list:
               - Ready
               - Processing
               - Error
               - Deleting
               - Warning
               path:
               - status
               - state
             type: StateSet
           errorLogV: 0
           help: status of Keda CR
           labelsFromPath:
             name:
             - metadata
             - name
             namespace:
             - metadata
             - namespace
           name: keda_status

after adding and deleting the corresponding CRD and on CR its kind this is a part of the /metrics response of kubestatemetrics:

# HELP kube_customresource_keda_status status of Keda CR
# TYPE kube_customresource_keda_status stateset
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_keda_status status of Keda CR
# TYPE kube_customresource_keda_status stateset
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_keda_status status of Keda CR
# TYPE kube_customresource_keda_status stateset
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 1
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 0
kube_customresource_keda_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0

as you can see, there are multiple entries for the same metric (#HELP, and #TYPE is mentioned 3 times. Within a single metric block lines are duplicated .

What you expected to happen:

  • The metric exists only one time in the output.
  • lines are unique

How to reproduce it (as minimally and precisely as possible):

  • apply a scrape configuration for a GKV
  • create the CRD in the cluster
  • create a CR for its kind
  • wait till the CR was scraped
  • delete the CRD from the cluster
    • create the CRD in the cluster
  • create a CR for its kind
  • wait till the CR was scraped
  • repeat those steps a few times

Anything else we need to know?:

For an advanced version of this bug create the following configuration in the cluster:

      - groupVersionKind:
           group: "operator.kyma-project.io"
           kind: "Sample"
           version: "*"
         errorLogV: 0
         metrics:
           - name: module_status
             errorLogV: 10
             help: "status of Module CR"
             each:
               type: StateSet
               stateSet:
                 labelName: state
                 path: [status, state]
                 list: [Ready, Processing, Error, Deleting, Warning]
             labelsFromPath:
               name: [metadata, name]
               namespace: [metadata, namespace]
       - errorLogV: 0
         groupVersionKind:
           group: operator.kyma-project.io
           kind: Keda
           version: '*'
         metrics:
         - each:
             stateSet:
               labelName: state
               list:
               - Ready
               - Processing
               - Error
               - Deleting
               - Warning
               path:
               - status
               - state
             type: StateSet
           errorLogV: 0
           help: status of Module CR
           labelsFromPath:
             name:
             - metadata
             - name
             namespace:
             - metadata
             - namespace
           name: module_status

This configuration puts the metrics of two different CRs into the same metric (kube_customresource_module_status)

Now if you create both CRDs and a matching CR and repeatedly create and remove one of the CRDs you will get output similar to this (here the sample-CRD was deleted):

# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Error"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Ready"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Warning"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Error"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Ready"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Warning"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0
# HELP kube_customresource_module_status status of Module CR
# TYPE kube_customresource_module_status stateset
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Error"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Ready"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Sample",customresource_version="v1alpha1",name="sample-yaml",namespace="default",state="Warning"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Deleting"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Error"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Processing"} 0
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Ready"} 1
kube_customresource_module_status{customresource_group="operator.kyma-project.io",customresource_kind="Keda",customresource_version="v1alpha1",name="default",namespace="kyma-system",state="Warning"} 0

Environment:

  • kube-state-metrics version: 2.10.0
  • Kubernetes version (use kubectl version): v1.26.7
  • Cloud provider or hardware configuration:
  • Other info:

k15r avatar Oct 20 '23 16:10 k15r

/triage accepted /assign @CatherineF-dev @rexagod

dashpole avatar Nov 02 '23 16:11 dashpole

I came here to open the same issue just to find it's already here.

This issue simply kills the ability to use kind: "*" or version: "*" when there are multiple items under the metrics of that resource.

Here you can find some example manifests and steps to reproduce the issue: https://gist.github.com/bergerx/adad24dcd7cc360e1f36fbb98407b27b

git clone [email protected]:adad24dcd7cc360e1f36fbb98407b27b.git ksm-2223
minikube start
kubectl apply \
  -f ksm-2223/crd-bar.example.com.yaml \
  -f ksm-2223/crd-foo.example.com.yaml
kubectl apply \
  -f ksm-2223/cr-bar.yaml \
  -f ksm-2223/cr-foo.yaml
go run main.go --custom-resource-state-only --custom-resource-state-config-file ksm-2223/custom-resource-config-file.yaml --kubeconfig ~/.kube/config

And here is the output:

$ curl localhost:8080/metrics
# HELP cr_creationtimestamp 
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.699031755e+09
# HELP cr_resourceversion 
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 508820
# HELP cr_creationtimestamp 
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 1.699031755e+09
# HELP cr_resourceversion 
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 508819

Prometheus compatible parsers will throw an error like this on line 8:

second TYPE line for metric name ... or TYPE reported after samples

In the example above it's a single resource definition in the custom-resource-state-config file, but the same issue also happens if the same metric name is somehow used for different GVKs, which I believe is also a valid scenario. e.g. we used to have this item under the .spec.resources repeated for multiple CRDs:

  - groupVersionKind:
      group: our.internal.group    # we have a copy of this whole thing for each internal group
      kind: "*"
      version: "*"
    labelsFromPath:
      name: [metadata, name]
      namespace: [metadata, namespace]
    metricNamePrefix: "cr"
    metrics:
    - name: status
      each:
        type: Gauge
        gauge:
          path: [status, conditions]
          labelsFromPath:
            type: [type]
          valueFrom: [status]

bergerx avatar Nov 03 '23 17:11 bergerx

https://github.com/kubernetes/kube-state-metrics/pull/1810 seems to be a related issue.

bergerx avatar Nov 17 '23 21:11 bergerx

I can reproduce this issue (metric values are not put together for one same metric) using https://github.com/kubernetes/kube-state-metrics/issues/2223#issuecomment-1792850276.

$ curl localhost:8089/metrics
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 391
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 392

QQ: I think the issue is that KSM doesn't put same metric value together. Is it correct? cc @bergerx @k15r

CatherineF-dev avatar Nov 21 '23 02:11 CatherineF-dev

I think the issue is here https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/builder.go#L210

		availableStores[gvrString] = func(b *Builder) []cache.Store {
			return b.buildCustomResourceStoresFunc(
				f.Name(),
				f.MetricFamilyGenerators(),
				f.ExpectedType(),
				f.ListWatch,
				b.useAPIServerCache,
			)
		}
  1. It always sets new values in availableStores[gvrString] and never clears up. So it still collects obsolete metrics.
  2. It uses GVR as a key, so it will generate two metrics for Foo and Bar.

CatherineF-dev avatar Nov 21 '23 03:11 CatherineF-dev

@CatherineF-dev Thanks for taking care of this issue.

I can reproduce this issue (metric values are not put together for one same metric) using #2223 (comment).

$ curl localhost:8089/metrics
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 391
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 1.700534671e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 392

QQ: I think the issue is that KSM doesn't put same metric value together. Is it correct? cc @bergerx @k15r

In my opinion there are multiple issues shown in your output:

  1. it creates duplicate entries for the same metric:
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge

it must look like this as "Only one TYPE line may exist for a given metric name"

# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
  1. the metric values differ
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 391
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Foo",customresource_version="v1",name="myfoo"} 392

Here it displays 392 AND 391 for the same metric with exactly the same values. It is not clear which one to use. For clients trying to parse this TEF there is no way to identify the correct value.

k15r avatar Nov 23 '23 14:11 k15r

guys, could you please update with ETA (if any) for this bug? We are affected by this for Vertical Pod Autoscaler metrics in case multiple containers run in the same pod. (kube-state-metrics CRS are configured accordingly to doc in this PR)

korjek avatar Nov 28 '23 16:11 korjek

Hi @k15r, could you provide detailed steps to reproduce this issue?

The first issue I want to fix is this:

 curl localhost:8089/metrics
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.701828773e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 909919
# HELP cr_creationtimestamp
# TYPE cr_creationtimestamp gauge
cr_creationtimestamp{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 1.701828773e+09
# HELP cr_resourceversion
# TYPE cr_resourceversion gauge
cr_resourceversion{customresource_group="example.com",customresource_kind="Bar",customresource_version="v1",name="mybar"} 909919

CatherineF-dev avatar Dec 06 '23 02:12 CatherineF-dev

Could you try https://github.com/kubernetes/kube-state-metrics/pull/2257 to see whether repeated adding and deleting CustomResourceDefinitions causes duplicate metric entries is fixed?

CatherineF-dev avatar Dec 06 '23 02:12 CatherineF-dev

I was just trying this feature on v2.10.1 with a type: StateSet and I think I see this or something very similar(??) or maybe a different issue(??).

With a config such as:

      containers:
      - args:
        - --port=8080
        - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
        - --telemetry-port=8081
        - --custom-resource-state-config
        - |
          spec:
            resources:
              - groupVersionKind:
                  group: "cluster.x-k8s.io"
                  version: "v1beta1"
                  kind: "Machine"
                metrics:
                  - name: "cunningr"
                    help: "Phase of Machines"
                    each:
                      type: StateSet
                      stateSet:
                        labelName: phase
                        path: ["status","phase"]
                        list: ['Provisioned', 'Pending', 'Running', 'Deleting', 'Failed']

Each of my Machine instances seems to get a new metrics instance:

kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Deleting"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Failed"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Pending"} 0
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Provisioned"} 1
kube_customresource_cunningr{customresource_group="cluster.x-k8s.io",customresource_kind="Machine",customresource_version="v1beta1",phase="Running"} 0

I would have expected those to be aggregated into a single gauge metric for each state?

cunningr avatar Jul 15 '24 07:07 cunningr

Hi, i have this problem configuring VPA with goldilocks and using kube prometheus stack, for some reason after a regular upgrade we start getting this warning from alertmanager

[[FIRING:1] PrometheusDuplicateTimestamps](...)
Severity: Warning
Summary: Prometheus is dropping samples with duplicate timestamps.

I use a grep with a PortForward to kube-state-metric to find what is the duplicated metric, after some time researching, identify the problem are when apply upgrades to CRD related to CustomResourceDefinitions that produce kube-state-metric refreshing and just add new CustomResourceDefinitions at the bottom of /metrics endpoint.

I see too in logs on kube-state-metrics "Custom resource state added metrics" added five times, all with the same familyNames

... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...
... ...  1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=...

after that i just manually restart the deployment of kube-state-metrics and when start again all is working as expected (without duplicates). I don't know if this experience will help anyone, but I think it may be related to this problem.

jmtt89 avatar Aug 12 '24 18:08 jmtt89