kube-state-metrics icon indicating copy to clipboard operation
kube-state-metrics copied to clipboard

Crash when deleting CRD

Open cinder-fish opened this issue 2 years ago • 6 comments

What happened: kube-state-metrics crashed after a CRD was deleted

What you expected to happen: to handle the deletion

How to reproduce it (as minimally and precisely as possible): Delete a CRD such that a cache.DeletedFinalStateUnknown is returned

possibly between intervals?

Anything else we need to know?: Here's a link to the client doc for this type https://pkg.go.dev/k8s.io/client-go/tools/cache#DeletedFinalStateUnknown

Environment:

  • kube-state-metrics version: 2.10.0
  • Kubernetes version (use kubectl version): v1.27.4
  • Cloud provider or hardware configuration: GKE
  • Other info:

Error log:

E0922 08:07:20.392009       1 runtime.go:79] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x188e6c0), concrete:(*runtime._type)(0x19602a0), asserted:(*runtime._type)(0x1af2fe0), missingMethod:""} (interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *unstructured.Unstructured)
goroutine 25 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x18ef280?, 0xc0018dc000})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
panic({0x18ef280, 0xc0018dc000})
	/usr/local/go-1.20.7/src/runtime/panic.go:884 +0x213
k8s.io/kube-state-metrics/v2/internal/discovery.(*CRDiscoverer).StartDiscovery.func2({0x19602a0?, 0xc001cc0700?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/src/k8s.io/kube-state-metrics/internal/discovery/discovery.go:78 +0x495
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:257
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:978 +0xaf
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000230738?, {0x1d7cbc0, 0xc000732000}, 0x1, 0xc000730000)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0000df4d0)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:967 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:70 +0x85
panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *unstructured.Unstructured [recovered]
	panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *unstructured.Unstructured

goroutine 25 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x18ef280, 0xc0018dc000})
	/usr/local/go-1.20.7/src/runtime/panic.go:884 +0x213
k8s.io/kube-state-metrics/v2/internal/discovery.(*CRDiscoverer).StartDiscovery.func2({0x19602a0?, 0xc001cc0700?})
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/src/k8s.io/kube-state-metrics/internal/discovery/discovery.go:78 +0x495
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:257
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:978 +0xaf
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000230738?, {0x1d7cbc0, 0xc000732000}, 0x1, 0xc000730000)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0000df4d0)
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:967 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/bitnami/blacksmith-sandox/kube-state-metrics-2.10.0/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:70 +0x85


cinder-fish avatar Sep 22 '23 08:09 cinder-fish

QQ: have you enabled custom state resource metrics feature?

CatherineF-dev avatar Sep 27 '23 13:09 CatherineF-dev

Ah, we have, however it's with the following settings

kind: CustomResourceStateMetrics
spec: {}

As nothing's been templated for it here

And also, apologies, I didn't check the logs far enough back before, with this it also logs the following regularly too

E0927 13:47:06.709384       1 reflector.go:148] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch apiextensions.k8s.io/v1, Resource=customresourcedefinitions: unknown

Which I'm assuming is related

So I guess this is either invalid config or an empty spec could be seen as a no-op?

cinder-fish avatar Sep 27 '23 14:09 cinder-fish

Since you are not using CustomResourceStateMetrics feature spec: {}, could you disable this feature and try again?

CatherineF-dev avatar Sep 27 '23 14:09 CatherineF-dev

Yes, have updated our templating to not include the --custom-resource-state-config flag when this is empty

Can confirm this resolves the log messages and KSM is once again unaffected by CRD changes

cinder-fish avatar Sep 27 '23 14:09 cinder-fish

Okay. I guess the crash might be related to https://github.com/kubernetes/kube-state-metrics/issues/2202

Discussed with @logicalhan yesterday around moving CustomResourceStateMetrics feature out of KSM repo. This issue is one datapoint which supports this idea.

CatherineF-dev avatar Sep 27 '23 14:09 CatherineF-dev

/triage accepted

CatherineF-dev avatar Sep 27 '23 14:09 CatherineF-dev