kube-state-metrics
kube-state-metrics copied to clipboard
panic in 2.3.0 related to bad ingress in GKE
What happened: kube-state-metrics panics and fails to start using config based on examples/standard directory, deployed to custom namespace
I0103 12:34:28.368172 1 main.go:108] Using default resources
I0103 12:34:28.368270 1 types.go:136] Using all namespace
I0103 12:34:28.368278 1 main.go:133] metric allow-denylisting: Excluding the following lists that were on denylist:
W0103 12:34:28.368317 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0103 12:34:28.369031 1 main.go:247] Testing communication with server
I0103 12:34:28.375030 1 main.go:252] Running with Kubernetes cluster version: v1.21. git version: v1.21.5-gke.1302. git tree state: clean. commit: 639f3a74abf258418493e9b75f2f98a08da29733. platform: linux/amd64
I0103 12:34:28.375060 1 main.go:254] Communication with server successful
I0103 12:34:28.375263 1 main.go:210] Starting metrics server: [::]:8080
I0103 12:34:28.375364 1 metrics_handler.go:96] Autosharding disabled
I0103 12:34:28.375393 1 main.go:199] Starting kube-state-metrics self metrics server: [::]:8081
I0103 12:34:28.375535 1 main.go:66] levelinfomsgTLS is disabled.http2false
I0103 12:34:28.375611 1 main.go:66] levelinfomsgTLS is disabled.http2false
I0103 12:34:28.376853 1 builder.go:192] Active resources: certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
W0103 12:34:28.379101 1 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W0103 12:34:28.395480 1 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
W0103 12:34:28.410311 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0103 12:34:28.414499 1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
E0103 12:34:28.421128 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 76 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1631020, 0x2685d50})
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x2540be400})
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x75
panic({0x1631020, 0x2685d50})
/usr/local/go/src/runtime/panic.go:1038 +0x215
k8s.io/kube-state-metrics/v2/internal/store.ingressMetricFamilies.func6(0x70)
/go/src/k8s.io/kube-state-metrics/internal/store/ingress.go:136 +0x189
k8s.io/kube-state-metrics/v2/internal/store.wrapIngressFunc.func1({0x17f9880, 0xc000422da0})
/go/src/k8s.io/kube-state-metrics/internal/store/ingress.go:175 +0x49
k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(...)
/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:67
k8s.io/kube-state-metrics/v2/pkg/metric_generator.ComposeMetricGenFuncs.func1({0x17f9880, 0xc000422da0})
/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:107 +0xd8
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Add(0xc00014ce40, {0x17f9880, 0xc000422da0})
/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:72 +0xd4
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Replace(0xc00014ce40, {0xc00056bc00, 0x3f, 0x1443e01}, {0xc000607be0, 0xc06cd935190aabc6})
/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:134 +0xa5
k8s.io/client-go/tools/cache.(*Reflector).syncWith(0xc0004be8c0, {0xc00056b800, 0x3f, 0x0}, {0xc0007de000, 0x9})
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:456 +0x98
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func1(0xc0004be8c0, 0xc0000fca20, 0xc0004a0480, 0xc000607d60)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:354 +0x7ab
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0004be8c0, 0xc0004a0480)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:361 +0x265
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:221 +0x26
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f6244e784b0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000166280, {0x1a0dd20, 0xc0004d26e0}, 0x1, 0xc0004a0480)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0004be8c0, 0xc0004a0480)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:220 +0x1f8
created by k8s.io/kube-state-metrics/v2/internal/store.(*Builder).startReflector
/go/src/k8s.io/kube-state-metrics/internal/store/builder.go:427 +0x2c8
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x14844a9]
goroutine 76 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x2540be400})
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x1631020, 0x2685d50})
/usr/local/go/src/runtime/panic.go:1038 +0x215
k8s.io/kube-state-metrics/v2/internal/store.ingressMetricFamilies.func6(0x70)
/go/src/k8s.io/kube-state-metrics/internal/store/ingress.go:136 +0x189
k8s.io/kube-state-metrics/v2/internal/store.wrapIngressFunc.func1({0x17f9880, 0xc000422da0})
/go/src/k8s.io/kube-state-metrics/internal/store/ingress.go:175 +0x49
k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(...)
/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:67
k8s.io/kube-state-metrics/v2/pkg/metric_generator.ComposeMetricGenFuncs.func1({0x17f9880, 0xc000422da0})
/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:107 +0xd8
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Add(0xc00014ce40, {0x17f9880, 0xc000422da0})
/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:72 +0xd4
k8s.io/kube-state-metrics/v2/pkg/metrics_store.(*MetricsStore).Replace(0xc00014ce40, {0xc00056bc00, 0x3f, 0x1443e01}, {0xc000607be0, 0xc06cd935190aabc6})
/go/src/k8s.io/kube-state-metrics/pkg/metrics_store/metrics_store.go:134 +0xa5
k8s.io/client-go/tools/cache.(*Reflector).syncWith(0xc0004be8c0, {0xc00056b800, 0x3f, 0x0}, {0xc0007de000, 0x9})
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:456 +0x98
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func1(0xc0004be8c0, 0xc0000fca20, 0xc0004a0480, 0xc000607d60)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:354 +0x7ab
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc0004be8c0, 0xc0004a0480)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:361 +0x265
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:221 +0x26
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f6244e784b0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000166280, {0x1a0dd20, 0xc0004d26e0}, 0x1, 0xc0004a0480)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc0004be8c0, 0xc0004a0480)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:220 +0x1f8
created by k8s.io/kube-state-metrics/v2/internal/store.(*Builder).startReflector
/go/src/k8s.io/kube-state-metrics/internal/store/builder.go:427 +0x2c8
What you expected to happen: kube-state-metrics to start without error
How to reproduce it (as minimally and precisely as possible):
Deploy kube-state-metrics to GKE 1.21.5 in custom namespace
kubectl create namespace observability
Full k8s yaml to apply
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
namespace: observability
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: observability
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
namespace: observability
spec:
clusterIP: None
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
- name: telemetry
port: 8081
targetPort: telemetry
selector:
app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
name: kube-state-metrics
namespace: observability
spec:
progressDeadlineSeconds: 120
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app.kubernetes.io/name: kube-state-metrics
spec:
containers:
- image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
- containerPort: 8081
name: telemetry
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
securityContext:
runAsUser: 65534
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: kube-state-metrics
Anything else we need to know?:
Environment:
- kube-state-metrics version:
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0
- Kubernetes version (use
kubectl version
):Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:16:05Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5-gke.1302", GitCommit:"639f3a74abf258418493e9b75f2f98a08da29733", GitTreeState:"clean", BuildDate:"2021-10-21T21:35:48Z", GoVersion:"go1.16.7b7", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: GCP Managed GKE
- Other info:
I also see a very similar failure if I downgrade to 2.2.4 or 2.2.3
E0103 12:40:13.394862 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 71 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x18101a0, 0x26c12f0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:48 +0x86
panic(0x18101a0, 0x26c12f0)
/usr/local/go/src/runtime/panic.go:965 +0x1b9
k8s.io/kube-state-metrics/v2/internal/store.ingressMetricFamilies.func6(0xc000806110, 0xc00055cc40)
/go/src/k8s.io/kube-state-metrics/internal/store/ingress.go:136 +0x192
k8s.io/kube-state-metrics/v2/internal/store.wrapIngressFunc.func1(0x19edb60, 0xc000806110, 0xc000ae9d00)
/go/src/k8s.io/kube-state-metrics/internal/store/ingress.go:175 +0x5c
k8s.io/kube-state-metrics/v2/pkg/metric_generator.(*FamilyGenerator).Generate(...)
/go/src/k8s.io/kube-state-metrics/pkg/metric_generator/generator.go:58
Hey :wave: thanks for reporting this. Do you know for which ingress this happens? Do you have any ingresses without a backend or a service backend? Looking at the code, that's the only thing which seems to be the root cause: https://github.com/kubernetes/kube-state-metrics/blob/e080c3ce73ad514254e38dccb37c93bec6b257ae/internal/store/ingress.go#L136
@fpetkovski thanks for pointing that out. We had an ingress with a bad backend and removing it resolved the issue. However, I would still say having a single broken ingress should not be cause for crashing all of kube-state-metrics and there is still a bug here.
Error from bad ingress
Translation failed: invalid ingress spec: Ingress Backend is not a service
Thanks for checking that. I agree that this is still a bug and we should keep the issue open until it is resolved :+1:
/assign
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/unassign since I didnt get enough time to work on this
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten