opencost Log error when node label not found

Modified GetInstanceType function to print error to log when node.kubernetes.io/instance-type label is not found.

Apr 21 '21 23:04 kbrwn

Log before code change:

$ kubectl -n kubecost logs kubecost-cost-analyzer-654dbcf97c-k84bm cost-model
I0421 22:36:03.557312       1 router.go:808] Starting cost-model (git commit "1.78.0")
I0421 22:36:03.557480       1 router.go:829] Prometheus/Thanos Client Max Concurrency set to 5
I0421 22:36:03.563886       1 router.go:842] Retrieved a prometheus config file from: http://kubecost-prometheus-server.kubecost
I0421 22:36:03.564936       1 router.go:855] Found Kubecost job scrape interval of: 1m
I0421 22:36:03.564956       1 router.go:862] Using scrape interval of 60.000000
I0421 22:36:03.567160       1 router.go:880] Success: retrieved the 'up' query against prometheus at: http://kubecost-prometheus-server.kubecost
I0421 22:36:03.569553       1 clustercache.go:92] NAMESPACE: kubecost
I0421 22:36:03.773861       1 watchcontroller.go:195] Starting *v1.Node controller
I0421 22:36:03.773922       1 watchcontroller.go:195] Starting *v1.Namespace controller
I0421 22:36:03.773942       1 watchcontroller.go:195] Starting *v1.DaemonSet controller
I0421 22:36:03.773957       1 watchcontroller.go:195] Starting *v1.Pod controller
I0421 22:36:03.773979       1 watchcontroller.go:195] Starting *v1.Service controller
I0421 22:36:03.773997       1 watchcontroller.go:195] Starting *v1.ConfigMap controller
I0421 22:36:03.774022       1 watchcontroller.go:195] Starting *v1.ReplicaSet controller
I0421 22:36:03.774041       1 watchcontroller.go:195] Starting *v1.Deployment controller
I0421 22:36:03.774072       1 watchcontroller.go:195] Starting *v1.StatefulSet controller
I0421 22:36:03.774083       1 watchcontroller.go:195] Starting *v1.StorageClass controller
I0421 22:36:03.774142       1 watchcontroller.go:195] Starting *v1.PersistentVolume controller
I0421 22:36:03.776335       1 provider.go:386] metadata reports we are in GCE
I0421 22:36:03.782055       1 router.go:931] No pricing-configs configmap found at installtime, using existing configs: configmaps "pricing-configs" not found
panic: runtime error: slice bounds out of range [:2] with capacity 1

goroutine 1 [running]:
github.com/kubecost/cost-model/pkg/cloud.(*gcpKey).Features(0xc000b93048, 0xc000b93048, 0x1d591cb)
	/app/pkg/cloud/gcpprovider.go:1369 +0x64f
github.com/kubecost/cost-model/pkg/cloud.(*GCP).DownloadPricingData(0xc00043ae40, 0x0, 0x0)
	/app/pkg/cloud/gcpprovider.go:977 +0x2cb
github.com/kubecost/cost-model/pkg/costmodel.Initialize(0x0, 0x0, 0x0, 0xc00005c778)
	/app/pkg/costmodel/router.go:1053 +0x1638
main.main()
	/app/cmd/costmodel/main.go:20 +0x36

Log after code change:

$ kubectl -n kubecost logs kubecost-cost-analyzer-95c989d7-585ct cost-model
I0421 22:52:56.498918       1 router.go:808] Starting cost-model (git commit "1.78.0")
I0421 22:52:56.499059       1 router.go:829] Prometheus/Thanos Client Max Concurrency set to 5
I0421 22:52:56.506537       1 router.go:842] Retrieved a prometheus config file from: http://kubecost-prometheus-server.kubecost
I0421 22:52:56.507990       1 router.go:855] Found Kubecost job scrape interval of: 1m
I0421 22:52:56.508005       1 router.go:862] Using scrape interval of 60.000000
I0421 22:52:56.509948       1 router.go:880] Success: retrieved the 'up' query against prometheus at: http://kubecost-prometheus-server.kubecost
I0421 22:52:56.511961       1 clustercache.go:92] NAMESPACE: kubecost
I0421 22:52:56.713077       1 watchcontroller.go:195] Starting *v1.Service controller
I0421 22:52:56.713156       1 watchcontroller.go:195] Starting *v1.Namespace controller
I0421 22:52:56.713175       1 watchcontroller.go:195] Starting *v1.Node controller
I0421 22:52:56.713188       1 watchcontroller.go:195] Starting *v1.Pod controller
I0421 22:52:56.713213       1 watchcontroller.go:195] Starting *v1.StatefulSet controller
I0421 22:52:56.713225       1 watchcontroller.go:195] Starting *v1.ConfigMap controller
I0421 22:52:56.713251       1 watchcontroller.go:195] Starting *v1.DaemonSet controller
I0421 22:52:56.713264       1 watchcontroller.go:195] Starting *v1.Deployment controller
I0421 22:52:56.713283       1 watchcontroller.go:195] Starting *v1.StorageClass controller
I0421 22:52:56.713297       1 watchcontroller.go:195] Starting *v1.ReplicaSet controller
I0421 22:52:56.713317       1 watchcontroller.go:195] Starting *v1.PersistentVolume controller
I0421 22:52:56.715553       1 provider.go:386] metadata reports we are in GCE
I0421 22:52:56.720446       1 router.go:931] No pricing-configs configmap found at installtime, using existing configs: configmaps "pricing-configs" not found
E0421 22:52:56.721178       1 log.go:17] [Error] Failed to read 'node.kubernetes.io/instance-type' node label
panic: runtime error: slice bounds out of range [:2] with capacity 1

goroutine 1 [running]:
github.com/kubecost/cost-model/pkg/cloud.(*gcpKey).Features(0xc0001330f8, 0xc0001330f8, 0x1d591cb)
	/app/pkg/cloud/gcpprovider.go:1369 +0x64f
github.com/kubecost/cost-model/pkg/cloud.(*GCP).DownloadPricingData(0xc000454780, 0x0, 0x0)
	/app/pkg/cloud/gcpprovider.go:977 +0x2cb
github.com/kubecost/cost-model/pkg/costmodel.Initialize(0x0, 0x0, 0x0, 0xc00005c778)
	/app/pkg/costmodel/router.go:1053 +0x1638
main.main()
	/app/cmd/costmodel/main.go:20 +0x36

Apr 21 '21 23:04 kbrwn

Container built after code changes:

quay.io/kbrwn/cost-model:node-label-log

Apr 21 '21 23:04 kbrwn

Thank you for this PR @kbrwn . I think we'd prefer to log and just not panic in the case where this label doesn't exist. Want to trace the index out of bounds error and handle?

Apr 22 '21 16:04 AjayTripathy

@AjayTripathy Since the issue is accessing out of bound memory resulting in a panic, the options are to use go recovery via a defer or change the code to avoid this result. Here is an attempt at later option in gcpprovider.go by adding a conditional: https://github.com/kbrwn/cost-model/commit/bd8a425d9418ed66d3de589a578ba05d4692f8ab

Apr 22 '21 21:04 kbrwn

This pull request has been marked as stale because it has been open for 90 days with no activity. Please remove the stale label or comment or this pull request will be closed in 5 days.

Nov 30 '23 01:11 github-actions[bot]

opencost opencost copied to clipboard

Log error when node label not found

opencost
opencost copied to clipboard