jmx_exporter
jmx_exporter copied to clipboard
Kafka Consumer offset lag metrics
Hi All,
consumer offset lag metrics through jmx exporter is not working, I am having the following in the config files. however it doesn't fetch the details required.
- pattern: kafka.consumer<type=(.+), client-id=(.+)><>(records-lag-max) name: kafka_$1_$3 labels: client-id: $2
Version details: Kafka version: kafka_2.13-2.7.1 jmx exporter: https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.15.0/jmx_prometheus_javaagent-0.15.0.jar
Please let me know if any other details required.
Thanks
I found a similar example in kafka-connect.yml:
https://github.com/prometheus/jmx_exporter/blob/73ad291363297560f613fc7814920d2f6f3ecf39/example_configs/kafka-connect.yml#L22-L31
Maybe you can start with this and adapt it? If that doesn't work, please let us know how exactly the JMX bean and attributes are named, for example by attaching jconsole to the process and taking a screenshot of the MBean.
@fstab This seems correct, thanks ! 🙏
However, for some reason, I'm seeing this kind of output from jmx-exporter. We can see duplicate metrics , one with a value, the other with NaN.
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN
I can probably ignore the NaN in promql, however I wonder if this is expected behavior. What do you think?
If I set my configuration like this to fetch only a single attribute, records-lag for a specific topic and partition:
rules:
- pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>records-lag
name: kafka_connect_consumer_fetch_records_lag
labels:
clientId: "$1"
topic: "$2"
partition: "$3"
help: "Kafka Connect JMX metric type consumer-fetch-manager"
type: GAUGE
Then, initially, I see the metric 3 times in jmx-exporter:
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
After a few minutes of waiting, I see the same metrics but 2 values are NaN:
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN
I have exactly the same issue using this configuration. @conradkleinespel could you resolve it?
@tadam313 Unfortunately no
This configuration works as expected:
- pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(records-lag[a-zA-Z-]+|records-lag)
name: kafka_connect_consumer_fetch_$4
labels:
clientId: "$1"
topic: "$2"
partition: "$3"
help: "Kafka Connect JMX metric type consumer-fetch-manager"
type: GAUGE
# HELP kafka_connect_consumer_fetch_records_lag Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag gauge
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} 0.0
# HELP kafka_connect_consumer_fetch_records_lag_avg Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg gauge
kafka_connect_consumer_fetch_records_lag_avg{clientId="foo",partition="0",topic="bar",} NaN
# HELP kafka_connect_consumer_fetch_records_lag_max Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max gauge
kafka_connect_consumer_fetch_records_lag_max{clientId="foo",partition="0",topic="bar",} NaN
The order is important, it looks like records-lag needs to be at the end. I couldn't make records-lag$ work.
This one creates a mess:
- pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(records-lag.*)
# HELP kafka_connect_consumer_fetch_records_lag_avg:_205_4 Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg:_205_4 gauge
kafka_connect_consumer_fetch_records_lag_avg:_205_4{clientId="foo",partition="0",topic="bar",} 205.4
# HELP kafka_connect_consumer_fetch_records_lag_max:_479_0 Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max:_479_0 gauge
kafka_connect_consumer_fetch_records_lag_max:_479_0{clientId="foo",partition="0",topic="bar",} 479.0
I hope this helps.
@superfav Thanks for your help, this does fix the issue on my side too! I had a quick look at the JMX exporter docs, it says the pattern is not anchored: from what I understand, it means ^ and $ are not supported in patterns.
Closing as resolved.
I have instantiated jmx-exporter-prometheus containers in many Kafka services. Some metrics are being exported in duplicate with one of the values being NaN.
For example in ksql service:
curl http://[TARGET_IP]:5556/metrics | grep kafka_consumer_consumer_fetch_manager_metrics_records_lag
kafka_consumer_consumer_fetch_manager_metrics_records_lag{clientId="client-12",partition="0",topic="topic-123",} NaN
kafka_consumer_consumer_fetch_manager_metrics_records_lag{clientId="client-12",partition="0",topic="topic-123",} 15.2
My jmx-exporter has the following configuration:
jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:5555/jmxrmi
lowercaseOutputName: true
rules:
# kafka.streams:type=stream-thread-metrics,thread-id="{threadId}"
- pattern: 'kafka.streams<type=stream-thread-metrics, thread-id=(.+)><>(.+-total|.+-rate|.+-avg)'
name: kafka_streams_stream_thread_metrics_$2
labels:
threadId: "$1"
help: "Kafka Streams JMX metric $2"
type: GAUGE
# kafka.streams:type=stream-task-metrics,thread-id="{threadId}",task-id="{taskId}"
- pattern: 'kafka.streams<type=stream-task-metrics, thread-id=(.+), task-id=(.+)><>(.+-total|.+-rate|.+-ratio|.+-avg)'
name: kafka_streams_stream_task_metrics_$3
labels:
threadId: "$1"
taskId: "$2"
help: "Kafka Streams JMX metric $3"
type: GAUGE
#kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}",partition="{partition}"
#kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}",partition="{partition}"
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(.+-total|.+-rate|.+-avg|.+-lag)
name: kafka_$1_$2_metrics_$6
labels:
clientId: "$3"
topic: "$4"
partition: "$5"
help: "Kafka $1 JMX metric type $2"
type: GAUGE
#kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}"
#kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}""
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+)><>(.+-total|.+-rate|.+-avg)
name: kafka_$1_$2_metrics_$5
labels:
clientId: "$3"
topic: "$4"
help: "Kafka $1 JMX metric type $2"
type: GAUGE
#kafka.streams:type=streams-node-metrics,client-id="{clientid}",node-id="{nodeid}"
#kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id="{nodeid}"
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), node-id=(.+)><>(.+-total|.+-avg)
name: kafka_$1_$2_metrics_$5
labels:
clientId: "$3"
nodeId: "$4"
help: "Kafka $1 JMX metric type $2"
type: UNTYPED
#kafka.streams:type=kafka-metrics-theads,client-id="{clientid}"
#kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}"
#kafka.consumer:type=consumer-coordinator-metrics,client-id="{clientid}"
#kafka.consumer:type=consumer-metrics,client-id="{clientid}"
#kafka.producer:type=producer-metrics,client-id="{clientid}"
- pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.*)><>(.+-total|.+-avg|.+-bytes|.+-count|.+-ratio|.+-rate|.+-age|.+-flight|.+-threads|.+-connectors|.+-tasks|.+-ago)
name: kafka_$1_$2_metrics_$4
labels:
clientId: "$3"
help: "Kafka $1 JMX metric type $2"
type: GAUGE
#io.confluent.ksql.metrics:type=ksql-engine-query-stats:{serviceId}
- pattern: 'io.confluent.ksql.metrics<type=ksql-engine-query-stats><>(_confluent-ksql-default_)(.+-total|.+-rate|.+-avg|.+-per-sec|num.+queries)'
name: ksql_metrics_ksql_engine_query_stats_$2
labels:
serviceId: "$1"
help: "ksql JMX metric $2"
type: GAUGE
- pattern: 'io.confluent.ksql.metrics<type=ksql-engine-query-stats><>(_confluent-ksql-default_ksql-engine-query-stats-)(.+-queries)'
name: ksql_metrics_ksql_engine_query_stats_$2
labels:
serviceId: "$1"
help: "ksql JMX metric $2"
type: GAUGE
#io.confluent.ksql.metrics:id={id},key={key},type=producer-metrics
#io.confluent.ksql.metrics:id={id},key={key},type=consumer-metrics
- pattern: 'io.confluent.ksql.metrics<id=(.*),key=(.*),type=(.+)-metrics><>(.+-per-sec|.+-messages|.+-bytes)'
name: ksql_metrics_$3_metrics_$4
labels:
id: "$1"
key: "$2"
help: "ksql JMX m
Can help, pls?