stackdriver_exporter icon indicating copy to clipboard operation
stackdriver_exporter copied to clipboard

panic: duplicate label names

Open acondrat opened this issue 4 years ago • 13 comments

Looks like the exporter crashes when a metric has duplicate label names.

time="2020-05-04T13:49:10Z" level=info msg="Starting stackdriver_exporter (version=0.7.0, branch=HEAD, revision=a339261e716271d77f6dc73d1998600d6d31089b)" source="stackdriver_exporter.go:136"
time="2020-05-04T13:49:10Z" level=info msg="Build context (go=go1.14.2, user=root@6bfda044714a, date=20200501-12:39:15)" source="stackdriver_exporter.go:137"
time="2020-05-04T13:49:10Z" level=info msg="Listening on :9255" source="stackdriver_exporter.go:163"
panic: duplicate label names

goroutine 162 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstHistogram(...)
	/app/vendor/github.com/prometheus/client_golang/prometheus/histogram.go:619
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstHistogram(0xc000377a18, 0xc0005fe280, 0x50, 0xc000a47600, 0xe, 0x10, 0xc0003e4870, 0xc000a51da0, 0xc000a47700, 0xe, ...)
	/app/collectors/monitoring_metrics.go:94 +0x19a
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeHistogramMetrics(0xc000377a18)
	/app/collectors/monitoring_metrics.go:186 +0x1c7
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc000377a18)
	/app/collectors/monitoring_metrics.go:149 +0x39
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc0000b88f0, 0xc000940000, 0xc000356b00, 0xc0001a4f00, 0xc000940000, 0x0)
	/app/collectors/monitoring_collector.go:370 +0x10a3
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc000412640, 0xc0000b88f0, 0xe1193bb, 0xed6421353, 0x0, 0xe1193bb, 0xed642147f, 0x0, 0xc0003645a0, 0xc000356b00, ...)
	/app/collectors/monitoring_collector.go:223 +0x5e7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
	/app/collectors/monitoring_collector.go:197 +0x3f3

It seems like something introduced in v0.7.0 as I don't see the same issue in v0.6.0

acondrat avatar May 04 '20 14:05 acondrat

I wonder if this is a side effect of https://github.com/prometheus-community/stackdriver_exporter/pull/50

SuperQ avatar May 04 '20 16:05 SuperQ

Having the exact same issue here, testing version 0.6.0 and it is not reproduce-able - so it's probably either 0.7.0 or 0.8.0 (0.7.0 introduce #50)

omerlh avatar May 25 '20 06:05 omerlh

Can you try building and running with #50 reverted?

SuperQ avatar May 25 '20 07:05 SuperQ

Still crash with the same error:

panic: duplicate label names

goroutine 99 [running]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...) /Users/omerlh/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/value.go:106 github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc000183a10, 0xc0005c8500, 0x4d, 0x34f92458, 0xed65d7041, 0x0, 0xc000285480, 0x8, 0x8, 0x2, ...) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:138 +0x204 github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc000183a10) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:179 +0x1dd github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc000183a10) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:160 +0x2b github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc000156000, 0xc00011e200, 0xc000140500, 0xc000128480, 0xc00011e200, 0x0) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:400 +0x13c4 github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc000606620, 0xc000156000, 0x34f92458, 0xed65d6f15, 0x0, 0x34f92458, 0xed65d7041, 0x0, 0xc000756540, 0xc000140500, ...) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:253 +0x6d7 created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1 /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:227 +0x2bb

omerlh avatar May 25 '20 07:05 omerlh

So I guess it's not #50 then, something else with the client_golang upgrade.

SuperQ avatar May 25 '20 08:05 SuperQ

Can you include more details? Like the flags you're using with the exporter?

SuperQ avatar May 25 '20 08:05 SuperQ

Please find my setup bellow. I was having the duplicates panic issues with logging.googleapis.com/user. All other prefixes seem fine.

spec:
  containers:
  - command:
    - stackdriver_exporter
    env:
    - name: STACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES
      value: bigtable.googleapis.com/cluster,loadbalancing.googleapis.com/https/request_count,custom.googleapis.com,logging.googleapis.com/user
    - name: STACKDRIVER_EXPORTER_MONITORING_METRICS_INTERVAL
      value: 5m
    - name: STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET
      value: 0s
    - name: STACKDRIVER_EXPORTER_WEB_LISTEN_ADDRESS
      value: :9255
    - name: STACKDRIVER_EXPORTER_WEB_TELEMETRY_PATH
      value: /metrics
    - name: STACKDRIVER_EXPORTER_MAX_RETRIES
      value: "0"
    - name: STACKDRIVER_EXPORTER_HTTP_TIMEOUT
      value: 10s
    - name: STACKDRIVER_EXPORTER_MAX_BACKOFF_DURATION
      value: 5s
    - name: STACKDRIVER_EXPORTER_BACKODFF_JITTER_BASE
      value: 1s
    - name: STACKDRIVER_EXPORTER_RETRY_STATUSES
      value: "503"
    image: prometheuscommunity/stackdriver-exporter:v0.7.0

acondrat avatar May 28 '20 10:05 acondrat

Same issue here with v0.9.1:

level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:136 msg="Starting stackdriver_exporter" version="(version=0.9.1, branch=HEAD, revision=770b1be3d430ef9768f30a2a5d2e35557e464f3c)"
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:137 msg="Build context" build_context="(go=go1.14.4, user=root@faf330a7765b, date=20200602-12:12:58)"
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:158 msg="Listening on" address=:9255
panic: duplicate label names

goroutine 9602 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
	/app/vendor/github.com/prometheus/client_golang/prometheus/value.go:106
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc000e37a10, 0xc0004c2460, 0x4d, 0x12059df0, 0xed67954c8, 0x0, 0xc000871b00, 0xe, 0x10, 0x2, ...)
	/app/collectors/monitoring_metrics.go:139 +0x204
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc000e37a10)
	/app/collectors/monitoring_metrics.go:180 +0x1dd
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc000e37a10)
	/app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc00087b080, 0xc000901a00, 0xc000d2fe00, 0xc00165ef00, 0xc000901a00, 0x0)
	/app/collectors/monitoring_collector.go:414 +0x13c4
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc002207920, 0xc00087b080, 0x12059e1f, 0xed679539c, 0x0, 0x12059e1f, 0xed67954c8, 0x0, 0xc000fac720, 0xc000d2fe00, ...)
	/app/collectors/monitoring_collector.go:267 +0x6d7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
	/app/collectors/monitoring_collector.go:241 +0x3f3

Edit 2020/07/14 I can confirm that the issue is still present in v0.10.0:


goroutine 477 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
	/app/vendor/github.com/prometheus/client_golang/prometheus/value.go:107
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc0005cda10, 0xc0010ad7c0, 0x43, 0x1e63a5d8, 0xed69f2568, 0x0, 0xc001fc1580, 0x8, 0x8, 0x2, ...)
	/app/collectors/monitoring_metrics.go:139 +0x204
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc0005cda10)
	/app/collectors/monitoring_metrics.go:180 +0x1dd
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc0005cda10)
	/app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc0001e2540, 0xc001c10630, 0xc0005f4500, 0xc0003940c0, 0xc001c10630, 0x0)
	/app/collectors/monitoring_collector.go:406 +0x13c4
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc0004f8f60, 0xc0001e2540, 0x1e63a902, 0xed69f24b4, 0x0, 0x1e63a902, 0xed69f25e0, 0x0, 0xc000720780, 0xc0005f4500, ...)
	/app/collectors/monitoring_collector.go:259 +0x6d7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
	/app/collectors/monitoring_collector.go:233 +0x3f3

dgarcdu avatar Jun 15 '20 11:06 dgarcdu

So I've debugged it as I have the same case.

The root cause of the problem is when you have defined custom metrics based on logs with some extractors and these extractors load the same labels as are injected by default by GCP logging

Example: You are on GKE. You have custom metric based on logs with extractor from field resource.labels.cluster_name into label cluster_name. For custom metrics on GKE cluster_name is already reported by default by GCP so you will see the duplicated labels what cause the panic.

Workaround: delete your custom extractors which are technically not needed

Edit: As far as I can see project_id is also injected by default.

jakubbujny avatar Sep 09 '20 08:09 jakubbujny

So we had the exact same issue as above having duplicated the project_id ourselves. But we discovered an other issue with duplicate labels after enabling audit logs for spanner:

* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_log_entry_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"instance-east-1" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"production" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"1" > gauge:<value:10527 > timestamp_ms:1612880903770 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"instance-east-1" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"production" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:2.2907337e+07 > timestamp_ms:1612880903770 } was collected before with the same name and label values

I think it'd make sense to make the exporter more robust and only report duplicate labels on the cli and export an error metric instead.

EDIT: Same issue as in: #103

hanikesn avatar Feb 09 '21 14:02 hanikesn

So I've debugged it as I have the same case.

The root cause of the problem is when you have defined custom metrics based on logs with some extractors and these extractors load the same labels as are injected by default by GCP logging

Example: You are on GKE. You have custom metric based on logs with extractor from field resource.labels.cluster_name into label cluster_name. For custom metrics on GKE cluster_name is already reported by default by GCP so you will see the duplicated labels what cause the panic.

Workaround: delete your custom extractors which are technically not needed

Edit: As far as I can see project_id is also injected by default.

We finally solved this by going over all our log-based metrics. Took a while, as we have quite a few, but we removed the duplicate labels and have not had any problem since.

dgarcdu avatar Feb 09 '21 15:02 dgarcdu

I've opened a PR https://github.com/prometheus-community/stackdriver_exporter/pull/153 which should fix this. Can someone review it and merge it if possible?

gidesh avatar Apr 20 '22 13:04 gidesh

Still seeing this issue, however not getting a panic in the container logs. It shows up on /metrics page. I tried @jakubbujny suggestion of removing the label and label extractors but that didnt work.

Trying to scrape log-based metric for gke human initiated admin event that is a counter type: protoPayload.methodName=~"google.container.v1.ClusterManager.*" NOT protoPayload.methodName:"get" NOT protoPayload.methodName:"list" protoPayload.authenticationInfo.principalEmail:*

JediNight avatar May 20 '22 02:05 JediNight