kafka_exporter Metric error: collected metric was collected before with the same name and label values

When using the exporter (version: danielqsj/kafka-exporter:v1.4.2), we are sometimes experiencing the following error that no metrics are displayed:

An error has occurred while serving metrics:
collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"<NAME>" > gauge:<value:0 > } was collected before with the same name and label values

This does not happens with the other "exporter" from yahoo (https://github.com/yahoo/CMAK).

After manually reassigning the __consumer_offsets (topic) in Kafka, the exporter starts on collecting the metrics correctly. Anyone experiencing a similar behaviour before?

Apr 21 '22 10:04 faabsen

An error has occurred while serving metrics:

43 error(s) occurred:
* collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"winlogbeat_printer" > gauge:<value:2 > } was collected before with the same name and label values
* collected metric "kafka_consumergroup_current_offset" { label:<name:"consumergroup" value:"winlogbeat_printer" > label:<name:"partition" value:"0" > label:<name:"topic" value:"winlogbeat_printer" > gauge:<value:1617 > } was collected before with the same name and label values
* collected metric "kafka_consumergroup_lag" { label:<name:"consumergroup" value:"winlogbeat_printer" > label:<name:"partition" value:"0" > label:<name:"topic" value:"winlogbeat_printer" > gauge:<value:0 > } was collected before with the same name and label values

I have the same problem as you

Apr 22 '22 06:04 yinyu985

I have same error, but reassign __consumer_offsets topic didn't help.

May 18 '22 04:05 rmrf

I have same error ` An error has occurred while serving metrics:

1004 error(s) occurred:

collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"yarn_eml_105_9" > gauge:<value:0 > } was collected before with the same name and label values
collected metric "kafka_consumergroup_current_offset" { label:<name:"consumergroup" value:"yarn_eml_105_9" > label:<name:"partition" value:"11" > label:<name:"topic" value:"N11-LY" > gauge:<value:7.2583451022e+10 > } was collected before with the same name and label values`

May 18 '22 09:05 VolcanicSnow

same issue here. We are plugging ourselves to azure eventhub and I'm noticing this weird behavior in the logs:

[sarama] 2022/10/03 13:30:10 client/brokers registered new broker #0 at ehn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #1 at Ehn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #2 at EHn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #3 at EHN-central.servicebus.windows.net:9093

I'm specifying only one server and seeng 4 lines of this with different case sensitivity for the (same )broker name.

Oct 03 '22 14:10 alexinthesky

We have the same issue here , after a lot of rebalancing going on in the night. Also an EventHub user. Also the same casing symptom.

[sarama] 2022/10/06 06:04:53 Connected to broker at digizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #0) [sarama] 2022/10/06 06:04:55 Connected to broker at Digizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #1) [sarama] 2022/10/06 06:04:55 Connected to broker at DIgizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #2) [sarama] 2022/10/06 06:04:56 Connected to broker at DIGizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #3) [sarama] 2022/10/06 06:04:57 Connected to broker at DIGIzxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #4)

Oct 06 '22 06:10 lhaussknecht

hey man happy to hear we're not alone in this. Did you get things sorted out? we have an open ticket in Azure. It's been very long time. they are not sharing their changelog or deployments but I defo believe it's related to changes in their broker LB system.

Oct 13 '22 16:10 alexinthesky

We are also seeing the same issue especially after upgrading AKS from 1.21 to 1.23. Is there any update on the solution?

Oct 31 '22 04:10 mshekharee

Did you get things sorted out?

Nov 24 '22 02:11 misitechen

@misitechen

This was sorted out by raising a support case with Microsoft. Below was the Cause for the same.

Root Cause: As part of recent upgrade there is a change made in the service (Kafka request handler) that the service returns the list of virtual brokers (16 brokers) as part of metadata response so that client application process can create/manage multiple TCP connections to a topic and achieve better performance. However, the change has an impact on produce API in the case that connection(s) are not fully utilized and become idle due to inactivity. In such case, producer app can hit request timeout when sending a message if the message was sent over a connection which was already terminated due to idleness and result in retry.

Resolution: As part of mitigation we have reverted the change and have virtual broker host return one address again.

Nov 24 '22 04:11 mshekharee

@mshekharee Very interesting! Would you mind sharing a date range, when Microsoft applied the update and reverted it?

Nov 24 '22 10:11 lhaussknecht

@lhaussknecht Looks like Microsoft is handling this on account to account basis. The changes for our account was reverted on a month ago

Nov 25 '22 12:11 mshekharee

Same question. Does anyone deal with it now?

May 17 '23 09:05 KD0735

We worked around this by adding --group.filter='.+' to the argument list.

May 17 '23 09:05 lhaussknecht

We had similar issue with KafkaExporter with managed Kafka on Oracle Cloud (OCI Streams) - solved with

--group.filter='.+'

May 24 '23 23:05 davidpechcz

However, the addition of the '--group.filter' parameter will cause that the consumer indicator cannot be collected

Mar 01 '24 07:03 xiangrm

kafka_exporter kafka_exporter copied to clipboard

Metric error: collected metric was collected before with the same name and label values

kafka_exporter
kafka_exporter copied to clipboard