jmx_exporter icon indicating copy to clipboard operation
jmx_exporter copied to clipboard

Kafka Consumer offset lag metrics

Open pari205 opened this issue 4 years ago • 7 comments
trafficstars

Hi All,

consumer offset lag metrics through jmx exporter is not working, I am having the following in the config files. however it doesn't fetch the details required.

  • pattern: kafka.consumer<type=(.+), client-id=(.+)><>(records-lag-max) name: kafka_$1_$3 labels: client-id: $2

Version details: Kafka version: kafka_2.13-2.7.1 jmx exporter: https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.15.0/jmx_prometheus_javaagent-0.15.0.jar

Please let me know if any other details required.

Thanks

pari205 avatar Jun 08 '21 11:06 pari205

I found a similar example in kafka-connect.yml:

https://github.com/prometheus/jmx_exporter/blob/73ad291363297560f613fc7814920d2f6f3ecf39/example_configs/kafka-connect.yml#L22-L31

Maybe you can start with this and adapt it? If that doesn't work, please let us know how exactly the JMX bean and attributes are named, for example by attaching jconsole to the process and taking a screenshot of the MBean.

fstab avatar Jun 09 '21 22:06 fstab

@fstab This seems correct, thanks ! 🙏

However, for some reason, I'm seeing this kind of output from jmx-exporter. We can see duplicate metrics , one with a value, the other with NaN.

kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN

I can probably ignore the NaN in promql, however I wonder if this is expected behavior. What do you think?

conradkleinespel avatar Aug 12 '21 21:08 conradkleinespel

If I set my configuration like this to fetch only a single attribute, records-lag for a specific topic and partition:

rules:
  - pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>records-lag
    name: kafka_connect_consumer_fetch_records_lag
    labels:
      clientId: "$1"
      topic: "$2"
      partition: "$3"
    help: "Kafka Connect JMX metric type consumer-fetch-manager"
    type: GAUGE

Then, initially, I see the metric 3 times in jmx-exporter:

kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0

After a few minutes of waiting, I see the same metrics but 2 values are NaN:

kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} 0.0
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN
kafka_consumer_fetch_manager_records_lag{clientId="connector-consumer-foobar-0",partition="4",topic="foobartopic",} NaN

conradkleinespel avatar Aug 12 '21 21:08 conradkleinespel

I have exactly the same issue using this configuration. @conradkleinespel could you resolve it?

tadam313 avatar Jan 28 '22 15:01 tadam313

@tadam313 Unfortunately no

conradkleinespel avatar Feb 03 '22 09:02 conradkleinespel

This configuration works as expected:

  - pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(records-lag[a-zA-Z-]+|records-lag)
    name: kafka_connect_consumer_fetch_$4
    labels:
      clientId: "$1"
      topic: "$2"
      partition: "$3"
    help: "Kafka Connect JMX metric type consumer-fetch-manager"
    type: GAUGE
# HELP kafka_connect_consumer_fetch_records_lag Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag gauge
kafka_connect_consumer_fetch_records_lag{clientId="foo",partition="0",topic="bar",} 0.0
# HELP kafka_connect_consumer_fetch_records_lag_avg Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg gauge
kafka_connect_consumer_fetch_records_lag_avg{clientId="foo",partition="0",topic="bar",} NaN
# HELP kafka_connect_consumer_fetch_records_lag_max Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max gauge
kafka_connect_consumer_fetch_records_lag_max{clientId="foo",partition="0",topic="bar",} NaN

The order is important, it looks like records-lag needs to be at the end. I couldn't make records-lag$ work. This one creates a mess:

  - pattern: kafka.consumer<type=consumer-fetch-manager-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(records-lag.*)
# HELP kafka_connect_consumer_fetch_records_lag_avg:_205_4 Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_avg:_205_4 gauge
kafka_connect_consumer_fetch_records_lag_avg:_205_4{clientId="foo",partition="0",topic="bar",} 205.4
# HELP kafka_connect_consumer_fetch_records_lag_max:_479_0 Kafka Connect JMX metric type consumer-fetch-manager
# TYPE kafka_connect_consumer_fetch_records_lag_max:_479_0 gauge
kafka_connect_consumer_fetch_records_lag_max:_479_0{clientId="foo",partition="0",topic="bar",} 479.0

I hope this helps.

superfav avatar Feb 09 '22 11:02 superfav

@superfav Thanks for your help, this does fix the issue on my side too! I had a quick look at the JMX exporter docs, it says the pattern is not anchored: from what I understand, it means ^ and $ are not supported in patterns.

conradkleinespel avatar Feb 10 '22 15:02 conradkleinespel

Closing as resolved.

dhoard avatar Jun 24 '23 02:06 dhoard

I have instantiated jmx-exporter-prometheus containers in many Kafka services. Some metrics are being exported in duplicate with one of the values being NaN.

For example in ksql service:

curl http://[TARGET_IP]:5556/metrics | grep kafka_consumer_consumer_fetch_manager_metrics_records_lag

kafka_consumer_consumer_fetch_manager_metrics_records_lag{clientId="client-12",partition="0",topic="topic-123",} NaN
kafka_consumer_consumer_fetch_manager_metrics_records_lag{clientId="client-12",partition="0",topic="topic-123",} 15.2


My jmx-exporter has the following configuration:

jmxUrl: service:jmx:rmi:///jndi/rmi://localhost:5555/jmxrmi
lowercaseOutputName: true
rules:
  # kafka.streams:type=stream-thread-metrics,thread-id="{threadId}"
  - pattern: 'kafka.streams<type=stream-thread-metrics, thread-id=(.+)><>(.+-total|.+-rate|.+-avg)'
    name: kafka_streams_stream_thread_metrics_$2
    labels:
      threadId: "$1"
    help: "Kafka Streams JMX metric $2"
    type: GAUGE

  # kafka.streams:type=stream-task-metrics,thread-id="{threadId}",task-id="{taskId}"
  - pattern: 'kafka.streams<type=stream-task-metrics, thread-id=(.+), task-id=(.+)><>(.+-total|.+-rate|.+-ratio|.+-avg)'
    name: kafka_streams_stream_task_metrics_$3
    labels:
      threadId: "$1"
      taskId: "$2"
    help: "Kafka Streams JMX metric $3"
    type: GAUGE

  #kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}",partition="{partition}"
  #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}",partition="{partition}"
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+), partition=(.+)><>(.+-total|.+-rate|.+-avg|.+-lag)
    name: kafka_$1_$2_metrics_$6
    labels:
      clientId: "$3"
      topic: "$4"
      partition: "$5"
    help: "Kafka $1 JMX metric type $2"
    type: GAUGE

  #kafka.producer:type=producer-topic-metrics,client-id="{clientid}",topic="{topic}"
  #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}",topic="{topic}""
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), topic=(.+)><>(.+-total|.+-rate|.+-avg)
    name: kafka_$1_$2_metrics_$5
    labels:
      clientId: "$3"
      topic: "$4"
    help: "Kafka $1 JMX metric type $2"
    type: GAUGE

  #kafka.streams:type=streams-node-metrics,client-id="{clientid}",node-id="{nodeid}"
  #kafka.consumer:type=consumer-node-metrics,client-id=consumer-1,node-id="{nodeid}"
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.+), node-id=(.+)><>(.+-total|.+-avg)
    name: kafka_$1_$2_metrics_$5
    labels:
      clientId: "$3"
      nodeId: "$4"
    help: "Kafka $1 JMX metric type $2"
    type: UNTYPED

  #kafka.streams:type=kafka-metrics-theads,client-id="{clientid}"
  #kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{clientid}"
  #kafka.consumer:type=consumer-coordinator-metrics,client-id="{clientid}"
  #kafka.consumer:type=consumer-metrics,client-id="{clientid}"
  #kafka.producer:type=producer-metrics,client-id="{clientid}"
  - pattern: kafka.(.+)<type=(.+)-metrics, client-id=(.*)><>(.+-total|.+-avg|.+-bytes|.+-count|.+-ratio|.+-rate|.+-age|.+-flight|.+-threads|.+-connectors|.+-tasks|.+-ago)
    name: kafka_$1_$2_metrics_$4
    labels:
      clientId: "$3"
    help: "Kafka $1 JMX metric type $2"
    type: GAUGE

  #io.confluent.ksql.metrics:type=ksql-engine-query-stats:{serviceId}
  - pattern: 'io.confluent.ksql.metrics<type=ksql-engine-query-stats><>(_confluent-ksql-default_)(.+-total|.+-rate|.+-avg|.+-per-sec|num.+queries)'
    name: ksql_metrics_ksql_engine_query_stats_$2
    labels:
      serviceId: "$1"
    help: "ksql JMX metric $2"
    type: GAUGE
  - pattern: 'io.confluent.ksql.metrics<type=ksql-engine-query-stats><>(_confluent-ksql-default_ksql-engine-query-stats-)(.+-queries)'
    name: ksql_metrics_ksql_engine_query_stats_$2
    labels:
      serviceId: "$1"
    help: "ksql JMX metric $2"
    type: GAUGE

  #io.confluent.ksql.metrics:id={id},key={key},type=producer-metrics
  #io.confluent.ksql.metrics:id={id},key={key},type=consumer-metrics
  - pattern: 'io.confluent.ksql.metrics<id=(.*),key=(.*),type=(.+)-metrics><>(.+-per-sec|.+-messages|.+-bytes)'
    name: ksql_metrics_$3_metrics_$4
    labels:
      id: "$1"
      key: "$2"
    help: "ksql JMX m

Can help, pls?

JoelRodrigues58 avatar Feb 21 '24 10:02 JoelRodrigues58