cm-druid
cm-druid copied to clipboard
add monitoring metrics and alerts
monitoring is essential for the operation and for the optimal resource allocation. I would like to use Druid Emitters to push the metrics to CDH.
- Research how CDH ingest metrics. The metrics must be queryable by the CDH "tsquery" language in order for charting and alerting.
https://github.com/cloudera/cm_ext/wiki/Monitoring-Support-for-CSDs https://github.com/cloudera/cm_ext/wiki/Service-Monitoring-Descriptor-Language-Reference#-metrics - Implement emitter
Metrics
http://druid.io/docs/latest/operations/metrics.html
Alerts
http://druid.io/docs/latest/operations/alerts.html
Researched how Kafka is reporting metrics to CM. It's actually CM that is polling Kafka. There are configuration parameters in the CM Kafka Configuration.
These configuration keys and values are written to kafka-monitoring.properties.
The HTTP server is not of the Kafka distribution but a 3rd party module. The control.sh
script appends the exporter class property to the kafka.properties
. The HTTP server is running within the same process as the Kafka broker on a separate thread. https://github.com/arnobroekhof/kafka-http-metrics-reporter
Kafka is known to expose Yammer Metrics via JMX. The HTTP server exposes Codahale metrics of Kafka. Yammer and Codahale are same thing. It's confusing that the lib changed name twice. Originally called Yammer Metrics, then Codahale Metrics, and now Dropwizard Metrics.
You can get the metrics like this.
curl -H "Content-Type: application/json" http://localhost:24042/api/metrics | jq .
Then there is CMF Agent that polls the endpoint using kafka_adapter.py custom parser. I will implement the service.MDL, and map the keys and value type using the generic adapter.
Next step.
Overlord seems to be the best candidate to provide the metrics endpoint. It already runs the HTTP server. Probably we need two endpoints:
- metric collector endpoint that Druid emitter of each role pushes metric to.
- metric query endpoint that CM agent polls.
Research if it can be implemented as an extension. I don't want to touch the core Druid. As a side node, I do not want to run a separate HTTP server. It'd be another role if we did.