cm-druid icon indicating copy to clipboard operation
cm-druid copied to clipboard

add monitoring metrics and alerts

Open knoguchi opened this issue 7 years ago • 2 comments

monitoring is essential for the operation and for the optimal resource allocation. I would like to use Druid Emitters to push the metrics to CDH.

  • Research how CDH ingest metrics. The metrics must be queryable by the CDH "tsquery" language in order for charting and alerting.
    https://github.com/cloudera/cm_ext/wiki/Monitoring-Support-for-CSDs https://github.com/cloudera/cm_ext/wiki/Service-Monitoring-Descriptor-Language-Reference#-metrics
  • Implement emitter

Metrics

http://druid.io/docs/latest/operations/metrics.html

Alerts

http://druid.io/docs/latest/operations/alerts.html

knoguchi avatar Apr 07 '17 16:04 knoguchi

Researched how Kafka is reporting metrics to CM. It's actually CM that is polling Kafka. There are configuration parameters in the CM Kafka Configuration. screen shot 2017-04-07 at 3 19 43 pm

These configuration keys and values are written to kafka-monitoring.properties.

The HTTP server is not of the Kafka distribution but a 3rd party module. The control.sh script appends the exporter class property to the kafka.properties. The HTTP server is running within the same process as the Kafka broker on a separate thread. https://github.com/arnobroekhof/kafka-http-metrics-reporter

Kafka is known to expose Yammer Metrics via JMX. The HTTP server exposes Codahale metrics of Kafka. Yammer and Codahale are same thing. It's confusing that the lib changed name twice. Originally called Yammer Metrics, then Codahale Metrics, and now Dropwizard Metrics.

You can get the metrics like this.

curl -H "Content-Type: application/json" http://localhost:24042/api/metrics | jq .

Then there is CMF Agent that polls the endpoint using kafka_adapter.py custom parser. I will implement the service.MDL, and map the keys and value type using the generic adapter.

knoguchi avatar Apr 07 '17 22:04 knoguchi

Next step.

Overlord seems to be the best candidate to provide the metrics endpoint. It already runs the HTTP server. Probably we need two endpoints:

  • metric collector endpoint that Druid emitter of each role pushes metric to.
  • metric query endpoint that CM agent polls.

Research if it can be implemented as an extension. I don't want to touch the core Druid. As a side node, I do not want to run a separate HTTP server. It'd be another role if we did.

knoguchi avatar Apr 08 '17 00:04 knoguchi