kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

[Bug] Inconsistency between `customresource` interface and `customresource` causes panic

Open AliDatadog opened this issue 2 years ago • 7 comments

Issue Description: After the alteration of the custom resources handling behavior in https://github.com/kubernetes/kube-state-metrics/pull/1851, there has been a breaking change.

  1. GVRFromType is invariably invoked: https://github.com/kubernetes/kube-state-metrics/blob/76f42c19af50ff7c0390a1c57c680b07b6eb3e8a/internal/store/builder.go#L200
  2. The subsequent function https://github.com/rexagod/kube-state-metrics/blob/25a1d8da057cf761d614c59a52785335d34082d1/pkg/util/utils.go#L143 puts forward a presumption that ExpectedType() persistently returns unstructured.Unstructured{}. This triggers panic if a different argument type is provided.

The change leads to a breaking compatibility with the previous interface that we use internally to enhance KSM with other metrics.

Moreover, the availableStore index now uses gvr.String() over ResourceName as per https://github.com/kubernetes/kube-state-metrics/blob/76f42c19af50ff7c0390a1c57c680b07b6eb3e8a/internal/store/builder.go#L203. Consequently, the output of gvrString is structured as Resource=verticalpodautoscalers,apiregistration.k8s.io/v1

Expected Outcome: I anticipated ExpectedType() to have the capacity to return any resource type and f.Name() to serve as the resource index in availableStore. Another solution would be to introduce a new function StoreIndex() or StoreName() in this interface.

Reproducibility Steps: To reproduce this issue, simply add VPA as a custom resource and observe that the code triggers panic.

Additional Information: None applicable. Environment Details: I'm using the kind cluster 1.27.1.

  • Kube-state-metrics version: 2.10.0
  • Kubernetes version (retrievable using kubectl version): 1.27.1
  • Cloud provider or hardware setup: macOS ARM
  • Any Other Relevant Information:

AliDatadog avatar Sep 19 '23 15:09 AliDatadog

/assign @rexagod
/triage accepted

dashpole avatar Sep 21 '23 16:09 dashpole

Thanks for reporting this issue, found you have already used v2.8.2 as a mitigation.

CatherineF-dev avatar Oct 11 '23 01:10 CatherineF-dev

To reproduce this issue, simply add VPA as a custom resource and observe that the code triggers panic.

QQ: how did you notice this issue, is it possible to move some tests into this repo? If so, it won't break your components in the future.

CatherineF-dev avatar Oct 11 '23 01:10 CatherineF-dev

Unfortunately we did not implement a test able to catch this issue in particular.

Perhaps we could have a test checking that a simple extended store can be generated.

Here is an example of one: https://github.com/DataDog/datadog-agent/blob/0ee3251831d2a9da15f7c47e843ebaacc8b85aab/pkg/collector/corechecks/cluster/ksm/customresources/crd.go#L50

AliDatadog avatar Oct 11 '23 08:10 AliDatadog

Any update on this issue ? @rexagod @CatherineF-dev

AliDatadog avatar Jan 09 '24 10:01 AliDatadog

Could you try fixing this issue? cc @AliDatadog I haven't looked into details, but I can help review.

@rexagod might know details.

CatherineF-dev avatar Jan 10 '24 20:01 CatherineF-dev

@AliDatadog Are you using KSM as a library? We do not provide any guarantees on the exported APIs as of now. cc @dgrisonnet

rexagod avatar Jan 11 '24 17:01 rexagod