CMAK icon indicating copy to clipboard operation
CMAK copied to clipboard

Any examples of using CMAK with Prometheus + Grafana to monitor consumer lag?

Open rja1 opened this issue 2 years ago • 8 comments

I'd like to feed this data into Grafana: https://some_server.com/api/status/some_cluster/consumersSummary.

Anyone have experience/examples of doing this with CMAK?

Thanks

rja1 avatar Nov 03 '22 17:11 rja1

你可以尝试通过,消息挤压来推算

one-two-my-gad avatar Nov 04 '22 12:11 one-two-my-gad

Hey @rja1, I am a little late here maybe but why don't you just use the JMX-Export directly from Kafka to feed it into your prometheus/grafana?

janengelmohr avatar Dec 21 '22 08:12 janengelmohr

Good question @engelmohr, but jmx-exporter doesn't provide consumer offset data. I would have just used Kafka-exporter, but it doesn't support SASL_PLAINTEXT, which is what our clusters use for auth. I could have hit the kafka-ui api. It's not ideal though, because you have to make an api call for each consumer. Additionally, it can't connect to one of our legacy 0.10.2.0 clusters (unsupported version).

In the end, I wrote a python script to hit the CMAK api in a single call, store the offset data into a mysql database, where it can be mapped as a datasource in Grafana. Works great, but feels like a little bit like a hack

rja1 avatar Dec 21 '22 15:12 rja1

Kafka-exporter, but it doesn't support SASL_PLAINTEXT

JMX Exporter on the consumer clients is what you would need for lag. This is external from any auth settings.

You could also use Burrow to monitor lag for non-JVM clients.

OneCricketeer avatar Mar 22 '23 15:03 OneCricketeer

Thanks @OneCricketeer. As I recall, JMX doesn't expose lag data and Burrow doesn't support SASL_PLAINTEXT

rja1 avatar Mar 22 '23 16:03 rja1

Consumer JMX does have lag; not the broker/producer JMX.

I don't use Burrow, but I'd be very surprised if they did not... It uses Sarama, which does support it, to read directly from the offsets topic

OneCricketeer avatar Mar 22 '23 16:03 OneCricketeer

Gotcha. I actually ended up just writing a Python hack to slap the CMAK api, pulling the lag data by group and persist it to a mysql backend. I then tied it into grafana. Works great, but it's a little kludgy. Anyway, thanks for your replies. I'll check out Burrow again for fun

rja1 avatar Mar 22 '23 17:03 rja1

I took a look at Burrow myself, and seems there is an open PR for SASL PLAINTEXT, so you were right.

OneCricketeer avatar Mar 22 '23 19:03 OneCricketeer