kafka_exporter icon indicating copy to clipboard operation
kafka_exporter copied to clipboard

Metric error: collected metric was collected before with the same name and label values

Open faabsen opened this issue 2 years ago • 16 comments

When using the exporter (version: danielqsj/kafka-exporter:v1.4.2), we are sometimes experiencing the following error that no metrics are displayed:

An error has occurred while serving metrics:
collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"<NAME>" > gauge:<value:0 > } was collected before with the same name and label values

This does not happens with the other "exporter" from yahoo (https://github.com/yahoo/CMAK).

After manually reassigning the __consumer_offsets (topic) in Kafka, the exporter starts on collecting the metrics correctly. Anyone experiencing a similar behaviour before?

faabsen avatar Apr 21 '22 10:04 faabsen

An error has occurred while serving metrics:

43 error(s) occurred:
* collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"winlogbeat_printer" > gauge:<value:2 > } was collected before with the same name and label values
* collected metric "kafka_consumergroup_current_offset" { label:<name:"consumergroup" value:"winlogbeat_printer" > label:<name:"partition" value:"0" > label:<name:"topic" value:"winlogbeat_printer" > gauge:<value:1617 > } was collected before with the same name and label values
* collected metric "kafka_consumergroup_lag" { label:<name:"consumergroup" value:"winlogbeat_printer" > label:<name:"partition" value:"0" > label:<name:"topic" value:"winlogbeat_printer" > gauge:<value:0 > } was collected before with the same name and label values

I have the same problem as you

yinyu985 avatar Apr 22 '22 06:04 yinyu985

I have same error, but reassign __consumer_offsets topic didn't help.

rmrf avatar May 18 '22 04:05 rmrf

I have same error ` An error has occurred while serving metrics:

1004 error(s) occurred:

  • collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"yarn_eml_105_9" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "kafka_consumergroup_current_offset" { label:<name:"consumergroup" value:"yarn_eml_105_9" > label:<name:"partition" value:"11" > label:<name:"topic" value:"N11-LY" > gauge:<value:7.2583451022e+10 > } was collected before with the same name and label values`

VolcanicSnow avatar May 18 '22 09:05 VolcanicSnow

same issue here. We are plugging ourselves to azure eventhub and I'm noticing this weird behavior in the logs:

[sarama] 2022/10/03 13:30:10 client/brokers registered new broker #0 at ehn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #1 at Ehn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #2 at EHn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #3 at EHN-central.servicebus.windows.net:9093

I'm specifying only one server and seeng 4 lines of this with different case sensitivity for the (same )broker name.

alexinthesky avatar Oct 03 '22 14:10 alexinthesky

We have the same issue here , after a lot of rebalancing going on in the night. Also an EventHub user. Also the same casing symptom.

[sarama] 2022/10/06 06:04:53 Connected to broker at digizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #0) [sarama] 2022/10/06 06:04:55 Connected to broker at Digizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #1) [sarama] 2022/10/06 06:04:55 Connected to broker at DIgizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #2) [sarama] 2022/10/06 06:04:56 Connected to broker at DIGizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #3) [sarama] 2022/10/06 06:04:57 Connected to broker at DIGIzxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #4)

lhaussknecht avatar Oct 06 '22 06:10 lhaussknecht

hey man happy to hear we're not alone in this. Did you get things sorted out? we have an open ticket in Azure. It's been very long time. they are not sharing their changelog or deployments but I defo believe it's related to changes in their broker LB system.

alexinthesky avatar Oct 13 '22 16:10 alexinthesky

We are also seeing the same issue especially after upgrading AKS from 1.21 to 1.23. Is there any update on the solution?

mshekharee avatar Oct 31 '22 04:10 mshekharee

Did you get things sorted out?

misitechen avatar Nov 24 '22 02:11 misitechen

@misitechen

This was sorted out by raising a support case with Microsoft. Below was the Cause for the same.

Root Cause: As part of recent upgrade there is a change made in the service (Kafka request handler) that the service returns the list of virtual brokers (16 brokers) as part of metadata response so that client application process can create/manage multiple TCP connections to a topic and achieve better performance. However, the change has an impact on produce API in the case that connection(s) are not fully utilized and become idle due to inactivity. In such case, producer app can hit request timeout when sending a message if the message was sent over a connection which was already terminated due to idleness and result in retry.

Resolution: As part of mitigation we have reverted the change and have virtual broker host return one address again.

mshekharee avatar Nov 24 '22 04:11 mshekharee

@mshekharee Very interesting! Would you mind sharing a date range, when Microsoft applied the update and reverted it?

lhaussknecht avatar Nov 24 '22 10:11 lhaussknecht

@lhaussknecht Looks like Microsoft is handling this on account to account basis. The changes for our account was reverted on a month ago

mshekharee avatar Nov 25 '22 12:11 mshekharee

Same question. Does anyone deal with it now?

KD0735 avatar May 17 '23 09:05 KD0735

We worked around this by adding --group.filter='.+' to the argument list.

lhaussknecht avatar May 17 '23 09:05 lhaussknecht

We had similar issue with KafkaExporter with managed Kafka on Oracle Cloud (OCI Streams) - solved with

--group.filter='.+'

davidpechcz avatar May 24 '23 23:05 davidpechcz

However, the addition of the '--group.filter' parameter will cause that the consumer indicator cannot be collected

xiangrm avatar Mar 01 '24 07:03 xiangrm