kafka_exporter
kafka_exporter copied to clipboard
Metric error: collected metric was collected before with the same name and label values
When using the exporter (version: danielqsj/kafka-exporter:v1.4.2
), we are sometimes experiencing the following error that no metrics are displayed:
An error has occurred while serving metrics:
collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"<NAME>" > gauge:<value:0 > } was collected before with the same name and label values
This does not happens with the other "exporter" from yahoo (https://github.com/yahoo/CMAK).
After manually reassigning the __consumer_offsets
(topic) in Kafka, the exporter starts on collecting the metrics correctly.
Anyone experiencing a similar behaviour before?
An error has occurred while serving metrics:
43 error(s) occurred:
* collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"winlogbeat_printer" > gauge:<value:2 > } was collected before with the same name and label values
* collected metric "kafka_consumergroup_current_offset" { label:<name:"consumergroup" value:"winlogbeat_printer" > label:<name:"partition" value:"0" > label:<name:"topic" value:"winlogbeat_printer" > gauge:<value:1617 > } was collected before with the same name and label values
* collected metric "kafka_consumergroup_lag" { label:<name:"consumergroup" value:"winlogbeat_printer" > label:<name:"partition" value:"0" > label:<name:"topic" value:"winlogbeat_printer" > gauge:<value:0 > } was collected before with the same name and label values
I have the same problem as you
I have same error, but reassign __consumer_offsets
topic didn't help.
I have same error ` An error has occurred while serving metrics:
1004 error(s) occurred:
- collected metric "kafka_consumergroup_members" { label:<name:"consumergroup" value:"yarn_eml_105_9" > gauge:<value:0 > } was collected before with the same name and label values
- collected metric "kafka_consumergroup_current_offset" { label:<name:"consumergroup" value:"yarn_eml_105_9" > label:<name:"partition" value:"11" > label:<name:"topic" value:"N11-LY" > gauge:<value:7.2583451022e+10 > } was collected before with the same name and label values`
same issue here. We are plugging ourselves to azure eventhub and I'm noticing this weird behavior in the logs:
[sarama] 2022/10/03 13:30:10 client/brokers registered new broker #0 at ehn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #1 at Ehn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #2 at EHn-central.servicebus.windows.net:9093 [sarama] 2022/10/03 13:30:10 client/brokers registered new broker #3 at EHN-central.servicebus.windows.net:9093
I'm specifying only one server and seeng 4 lines of this with different case sensitivity for the (same )broker name.
We have the same issue here , after a lot of rebalancing going on in the night. Also an EventHub user. Also the same casing symptom.
[sarama] 2022/10/06 06:04:53 Connected to broker at digizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #0) [sarama] 2022/10/06 06:04:55 Connected to broker at Digizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #1) [sarama] 2022/10/06 06:04:55 Connected to broker at DIgizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #2) [sarama] 2022/10/06 06:04:56 Connected to broker at DIGizxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #3) [sarama] 2022/10/06 06:04:57 Connected to broker at DIGIzxxxxxxxxxxx.servicebus.windows.net:9093 (registered as #4)
hey man happy to hear we're not alone in this. Did you get things sorted out? we have an open ticket in Azure. It's been very long time. they are not sharing their changelog or deployments but I defo believe it's related to changes in their broker LB system.
We are also seeing the same issue especially after upgrading AKS from 1.21 to 1.23. Is there any update on the solution?
Did you get things sorted out?
@misitechen
This was sorted out by raising a support case with Microsoft. Below was the Cause for the same.
Root Cause: As part of recent upgrade there is a change made in the service (Kafka request handler) that the service returns the list of virtual brokers (16 brokers) as part of metadata response so that client application process can create/manage multiple TCP connections to a topic and achieve better performance. However, the change has an impact on produce API in the case that connection(s) are not fully utilized and become idle due to inactivity. In such case, producer app can hit request timeout when sending a message if the message was sent over a connection which was already terminated due to idleness and result in retry.
Resolution: As part of mitigation we have reverted the change and have virtual broker host return one address again.
@mshekharee Very interesting! Would you mind sharing a date range, when Microsoft applied the update and reverted it?
@lhaussknecht Looks like Microsoft is handling this on account to account basis. The changes for our account was reverted on a month ago
Same question. Does anyone deal with it now?
We worked around this by adding --group.filter='.+'
to the argument list.
We had similar issue with KafkaExporter with managed Kafka on Oracle Cloud (OCI Streams) - solved with
--group.filter='.+'
However, the addition of the '--group.filter' parameter will cause that the consumer indicator cannot be collected