kube-state-metrics
kube-state-metrics copied to clipboard
Duplicated custom resource metrics when exposed for builtin type
What happened:
Wanted to expose additional information for storageclass, expanded the config
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: storage.k8s.io
kind: StorageClass
version: v1
metricNamePrefix: kube_storageclass
metrics:
- name: "parameters"
help: "StorageClass parameters"
each:
type: Info
info:
labelsFromPath:
skuName: [parameters, skuName]
storageclass: [metadata, name]
I noticed that they are exposed twice, which makes it impossible to be scraped by prometheus (metrics are duplicated)
# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",storageclass="standard"} 1
# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",storageclass="standard"} 1
What you expected to happen:
Metrics are exposed once
How to reproduce it (as minimally and precisely as possible):
Use the config provided with k-s-m and kind cluster
Anything else we need to know?: Full logs
I0412 11:54:47.701990 1 wrapper.go:98] "Starting kube-state-metrics"
I0412 11:54:47.702368 1 builder.go:192] "The internal resource store already exists and is overridden by a custom resource store with the same name, please make sure it meets your expectation" registryName="storageclasses"
I0412 11:54:47.702518 1 server.go:186] "Used default resources"
I0412 11:54:47.702577 1 types.go:184] "Using all namespaces"
I0412 11:54:47.702660 1 server.go:219] "Metric allow-denylisting" allowDenyStatus="Including the following lists that were on allowlist: kube_deployment_labels, kube_pod_labels, kube_storageclass_labels, kube_storageclass_parameters"
W0412 11:54:47.702744 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0412 11:54:47.703382 1 server.go:364] "Tested communication with server"
I0412 11:54:47.710477 1 server.go:369] "Run with Kubernetes cluster version" major="1" minor="26" gitVersion="v1.26.3" gitTreeState="clean" gitCommit="9e644106593f3f4aa98f8a84b23db5fa378900bd" platform="linux/amd64"
I0412 11:54:47.710870 1 server.go:370] "Communication with server successful"
I0412 11:54:47.711399 1 server.go:316] "Started metrics server" metricsServerAddress="[::]:8080"
I0412 11:54:47.711684 1 server.go:74] levelinfomsgListening onaddress[::]:8080
I0412 11:54:47.711779 1 server.go:74] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0412 11:54:47.711911 1 metrics_handler.go:99] "Autosharding disabled"
I0412 11:54:47.713339 1 server.go:305] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0412 11:54:47.713536 1 server.go:74] levelinfomsgListening onaddress[::]:8081
I0412 11:54:47.713564 1 server.go:74] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I0412 11:54:47.714317 1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_storageclass_parameters]
I0412 11:54:47.714778 1 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_storageclass_parameters]
Environment:
- kube-state-metrics version:
2.8.0
- Kubernetes version (use
kubectl version
): 1.25 and 1.26 - Cloud provider or hardware configuration: AKS and local - kind
- Other info:
Can you please also add the kubernetes object you have created at your cluster (the storageclass)?
Yes, could you list all CRs for this CRD (storage.k8s.io)?
Guess there are two CRs.
@chrischdi @CatherineF-dev
❯ k get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
azurefile file.csi.azure.com Delete Immediate true 426d
azurefile-csi file.csi.azure.com Delete Immediate true 426d
azurefile-csi-premium file.csi.azure.com Delete Immediate true 426d
azurefile-premium file.csi.azure.com Delete Immediate true 426d
blob-storage-cockroach-azure-centralus disk.csi.azure.com Delete WaitForFirstConsumer true 220d
default (default) disk.csi.azure.com Delete WaitForFirstConsumer true 426d
kafka-test-ssd-centralus kubernetes.io/azure-disk Delete WaitForFirstConsumer true 17d
managed disk.csi.azure.com Delete WaitForFirstConsumer true 426d
managed-csi disk.csi.azure.com Delete WaitForFirstConsumer true 426d
managed-csi-premium disk.csi.azure.com Delete WaitForFirstConsumer true 426d
managed-kafka-example-ssd-centralus kubernetes.io/azure-disk Delete WaitForFirstConsumer true 20d
managed-premium disk.csi.azure.com Delete WaitForFirstConsumer true 426d
mongodb kubernetes.io/azure-disk Delete WaitForFirstConsumer true 271d
postgresql-hdd disk.csi.azure.com Delete WaitForFirstConsumer true 39d
postgresql-premium-ssd disk.csi.azure.com Delete WaitForFirstConsumer true 39d
postgresql-ssd disk.csi.azure.com Delete WaitForFirstConsumer true 46d
prometheus-premium kubernetes.io/azure-disk Delete Immediate true 39d
prometheus-ssd kubernetes.io/azure-disk Delete Immediate true 78d
test-cockroach-azure-centralus disk.csi.azure.com Delete Immediate true 31d
test-volume-populator disk.csi.azure.com Delete WaitForFirstConsumer true 70d
test-volume-populator-no-wait disk.csi.azure.com Delete Immediate false 69d
and the endpoint result is
# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile-csi"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-csi-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="blob-storage-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="prometheus-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="test-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator-no-wait"} 1
# HELP kube_storageclass_parameters StorageClass parameters
# TYPE kube_storageclass_parameters info
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Standard_LRS",storageclass="azurefile-csi"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-csi-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="azurefile-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="blob-storage-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="prometheus-premium"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="Premium_LRS",storageclass="test-cockroach-azure-centralus"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator"} 1
kube_storageclass_parameters{customresource_group="storage.k8s.io",customresource_kind="StorageClass",customresource_version="v1",skuName="StandardSSD_LRS",storageclass="test-volume-populator-no-wait"} 1
not all resources are exposed as they do not have property I want to expose and they are skipped IIUC
~~Right off the top of my mind, I'd say this happens because we buildCustomStore
for anything defined in the CRS config.~~ If the GVK is something that KSM supports natively, this will result in two different stores, but only serving the latest (registered) metrics. With #1851, we will move to supporting this feature only for CRs, and it is recommended to send a PR for anything you think is worth adding into KSM that folks can benefit from on a larger scale, similar to the ones that were merged previously, so this shouldn't really be a problem if the addition makes sense.
Additionally, I'd like to mention here that this is not quite the way things were meant to be defined (native resources in the CRS config) since, in a nutshell, this is something that, while being technically possible pre-#1851, was never really officially supported (it'd always conflict with the native store for the builtin type), even if users got the desired metrics up somehow.
EDIT.
A bit more context.
- Native stores.
-
CR store builder (while
buildStores
builds for native resources) returns stores for custom resources.
2nd EDIT.
- I mistook the issue statement for something slightly different, I've cut that part out.
I have also encountered this issue. It was not present in 8.7.0
For me, it occurs when I specify a --custom-resource-state-config
and include it in --resources
$ go run main.go --port=8080 --telemetry-port=8081 --kubeconfig=$KUBECONFIG \
--custom-resource-state-config='{"spec":{"resources":[{"groupVersionKind":{"group":"operators.coreos.com","version":"v1alpha1","kind":"ClusterServiceVersion"},"metrics":[{"name":"csv_info","help":"Cluster Service Version install status","each":{"type":"Info","info":{"labelsFromPath":{"name":["metadata","name"],"status":["status","phase"]}}}}]}]}}' \
--resources clusterserviceversions
I0514 01:30:02.929882 7804 wrapper.go:98] "Starting kube-state-metrics"
I0514 01:30:02.930730 7804 server.go:201] "Used resources" resources=[clusterserviceversions clusterserviceversions]
I0514 01:30:02.930792 7804 types.go:184] "Using all namespaces"
I0514 01:30:02.930801 7804 server.go:228] "Metric allow-denylisting" allowDenyStatus="Excluding the following lists that were on denylist: "
I0514 01:30:02.932450 7804 server.go:367] "Tested communication with server"
I0514 01:30:03.546785 7804 server.go:372] "Run with Kubernetes cluster version" major="1" minor="26" gitVersion="v1.26.3+k3s1" gitTreeState="clean" gitCommit="01ea3ff27be0b04f945179171cec5a8e11a14f7b" platform="linux/amd64"
I0514 01:30:03.546909 7804 server.go:373] "Communication with server successful"
I0514 01:30:03.551387 7804 server.go:324] "Started metrics server" metricsServerAddress="[::]:8080"
I0514 01:30:03.551408 7804 server.go:313] "Started kube-state-metrics self metrics server" telemetryAddress="[::]:8081"
I0514 01:30:03.551408 7804 metrics_handler.go:99] "Autosharding disabled"
I0514 01:30:03.551692 7804 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 01:30:03.552009 7804 server.go:73] levelinfomsgListening onaddress[::]:8081
I0514 01:30:03.552033 7804 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8081
I0514 01:30:03.552036 7804 server.go:73] levelinfomsgListening onaddress[::]:8080
I0514 01:30:03.552063 7804 server.go:73] levelinfomsgTLS is disabled.http2falseaddress[::]:8080
I0514 01:30:03.552103 7804 custom_resource_metrics.go:79] "Custom resource state added metrics" familyNames=[kube_customresource_csv_info]
I0514 01:30:03.552186 7804 builder.go:246] "Active resources" activeStoreNames="clusterserviceversions,clusterserviceversions"
I believe it was introduced by https://github.com/kubernetes/kube-state-metrics/pull/1928
https://github.com/kubernetes/kube-state-metrics/blob/3b95dd1cf0822342d09408c444e6b1954352084b/pkg/app/server.go#L176-L180
https://github.com/kubernetes/kube-state-metrics/blob/3b95dd1cf0822342d09408c444e6b1954352084b/pkg/app/server.go#L189-L192
See that custom resources will be added twice as a result of this - once from factories
and once from opts.Resources
I think the fix should be either:
1. Revert back to the v2.7.0 implementation:
https://github.com/kubernetes/kube-state-metrics/blob/abe3fd3184e16893b5a47196f90a94ed13e1b04d/pkg/app/server.go#L137-L140
This means that when using both --custom-resource-state-config
and --resources
, the custom resource names must be included in the resources
list in order to be included.
Personally I think this is the best option. When using --resources
, I think only the values in the supplied resources
list should be included, regardless of any custom resource configs
OR
2. Add logic to remove duplicates from the resources
list
I can open a PR to fix this if that helps
Putting it out there that this is still the case, and the later built (custom) stores are the only ones in effect.
Details
Native Store (registered first)
CRS Store (overrides the native store)
CRS configuration that was used (to build custom Deployment
stores that replaced the native Deployment
stores)
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: "apps"
version: "v1"
kind: "Deployment"
metrics:
- name: "test_metric"
help: "foo baz"
each:
type: Info
info:
path: [metadata]
labelsFromPath:
name: [name]
Final Deployment
metrics generated (from the overriding (custom) stores)
Also, as I mentioned, native objects won't be supported in custom resource configurations as we will depend entirely on CRDs going forward, and this supersedes this issue.
@grzesuav I'm not sure if it's possible (I'm leaning towards the contrary based on my understanding), but is it possible for KSM to expose metrics for the same native and CRS object (for instance, Deployment
s, or StorageClass
in your case) simultaneously?
Wanted to expose additional information for storageclass
I'm trying to understand how will this be facilitated from one KSM process, because AFAIR this was never the supported behavior (same internal object under both flags), and shouldn't be possible without the CR store overriding the native one. Can you provide the compete command (flags, and args) you're using to invoke KSM that allow you to "add" custom metrics for native objects on top of the original metrics exposed natively by KSM for the same object?
I found out that --custom-resource-state-only
can be used to only output CRS metrics, using this flag fixes this issue in case of same object in the CRS configuration and the --resources
argument, by suppressing the latter's output, which technically should be the original metrics (not the generated ones, hence the "duplicacy"), but as I mentioned earlier the native metrics are replaced by CRS ones for that case (which we will sunset, hence removing this conflict between stores).
/triage accepted
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted
(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted