hadoop_exporter
hadoop_exporter copied to clipboard
A hadoop exporter for prometheus, scrape hadoop metrics (including HDFS, YARN, MAPREDUCE, HBASE. etc.) from hadoop components jmx url.
Hadoop Exporter for Prometheus
Exports hadoop metrics via HTTP for Prometheus consumption.
How to run
python hadoop_exporter.py
Help on flags of hadoop_exporter:
$ python hadoop_exporter.py -h
usage: hadoop_exporter.py [-h] [-c cluster_name] [-hdfs namenode_jmx_url]
[-rm resourcemanager_jmx_url] [-dn datanode_jmx_url]
[-jn journalnode_jmx_url] [-mr mapreduce2_jmx_url]
[-hbase hbase_jmx_url] [-hive hive_jmx_url]
[-p metrics_path] [-host ip_or_hostname] [-P port]
hadoop node exporter args, including url, metrics_path, address, port and
cluster.
optional arguments:
-h, --help show this help message and exit
-c cluster_name, --cluster cluster_name
Hadoop cluster labels. (default "cluster_indata")
-hdfs namenode_jmx_url, --namenode-url namenode_jmx_url
Hadoop hdfs metrics URL. (default
"http://indata-10-110-13-165.indata.com:50070/jmx")
-rm resourcemanager_jmx_url, --resourcemanager-url resourcemanager_jmx_url
Hadoop resourcemanager metrics URL. (default
"http://indata-10-110-13-164.indata.com:8088/jmx")
-dn datanode_jmx_url, --datanode-url datanode_jmx_url
Hadoop datanode metrics URL. (default
"http://indata-10-110-13-163.indata.com:1022/jmx")
-jn journalnode_jmx_url, --journalnode-url journalnode_jmx_url
Hadoop journalnode metrics URL. (default
"http://indata-10-110-13-163.indata.com:8480/jmx")
-mr mapreduce2_jmx_url, --mapreduce2-url mapreduce2_jmx_url
Hadoop mapreduce2 metrics URL. (default
"http://indata-10-110-13-165.indata.com:19888/jmx")
-hbase hbase_jmx_url, --hbase-url hbase_jmx_url
Hadoop hbase metrics URL. (default
"http://indata-10-110-13-164.indata.com:16010/jmx")
-hive hive_jmx_url, --hive-url hive_jmx_url
Hadoop hive metrics URL. (default
"http://ip:port/jmx")
-p metrics_path, --path metrics_path
Path under which to expose metrics. (default
"/metrics")
-host ip_or_hostname, -ip ip_or_hostname, --address ip_or_hostname, --addr ip_or_hostname
Polling server on this address. (default "127.0.0.1")
-P port, --port port Listen to this port. (default "9130")
Tested on Apache Hadoop 2.7.3
hadoop_exporter
Usage
You can run each Collector
under directory cmd/
, just like:
cd hadoop_exporter/cmd
python hdfs_namenode.py -h
# input the params the script asked.
Or if you want to run the entire project, you should have an webhook/api with url = http://<rest_api_host_and_port>/alert/getservicesbyhost
to provide the jmx urls: the content in the webhook or api should be like:
{
"cluster_name1": [
{
"node1.fqdn.com": {
"DATANODE": {
"jmx": "http://node1.fqdn.com:1022/jmx"
},
"HBASE_REGIONSERVER": {
"jmx": "http://node1.fqdn.com:60030/jmx"
},
"HISTORYSERVER": {
"jmx": "node1.fqdn.com:19888/jmx"
},
"JOURNALNODE": {
"jmx": "node1.fqdn.com:8480/jmx"
},
"NAMENODE": {
"jmx": "node1.fqdn.com:50070/jmx"
},
"NODEMANAGER": {
"jmx": "node1.fqdn.com:8042/jmx"
}
}
},
{
"node2.fqdn.com": {
"DATANODE": {
"jmx": "http://node2.fqdn.com:1022/jmx"
},
"HBASE_REGIONSERVER": {
"jmx": "http://node2.fqdn.com:60030/jmx"
},
"HIVE_LLAP": {
"jmx": "http://node2.fqdn.com:15002/jmx"
},
"HIVE_SERVER_INTERACTIVE": {
"jmx": "http://node2.fqdn.com:10502/jmx"
},
"JOURNALNODE": {
"jmx": "http://node2.fqdn.com:8480/jmx"
},
"NODEMANAGER": {
"jmx": "http://node2.fqdn.com:8042/jmx"
}
}
},
{
"node3.fqdn.com": {
"DATANODE": {
"jmx": "http://node3.fqdn.com:1022/jmx"
},
"HBASE_MASTER": {
"jmx": "http://node3.fqdn.com:16010/jmx"
},
"HBASE_REGIONSERVER": {
"jmx": "http://node3.fqdn.com:60030/jmx"
},
"JOURNALNODE": {
"jmx": "http://node3.fqdn.com:8480/jmx"
},
"NODEMANAGER": {
"jmx": "http://node3.fqdn.com:8042/jmx"
},
"RESOURCEMANAGER": {
"jmx": "http://node3.fqdn.com:8088/jmx"
}
}
}
],
"cluster_name2": [
{
"node4.fqdn.com": {
"DATANODE": {
"jmx": "http://node4.fqdn.com:1022/jmx"
},
"HBASE_REGIONSERVER": {
"jmx": "http://node4.fqdn.com:60030/jmx"
},
"HISTORYSERVER": {
"jmx": "node4.fqdn.com:19888/jmx"
},
"JOURNALNODE": {
"jmx": "node4.fqdn.com:8480/jmx"
},
"NAMENODE": {
"jmx": "node4.fqdn.com:50070/jmx"
},
"NODEMANAGER": {
"jmx": "node4.fqdn.com:8042/jmx"
}
}
},
{
"node5.fqdn.com": {
"DATANODE": {
"jmx": "http://node5.fqdn.com:1022/jmx"
},
"HBASE_REGIONSERVER": {
"jmx": "http://node5.fqdn.com:60030/jmx"
},
"HIVE_LLAP": {
"jmx": "http://node5.fqdn.com:15002/jmx"
},
"HIVE_SERVER_INTERACTIVE": {
"jmx": "http://node5.fqdn.com:10502/jmx"
},
"JOURNALNODE": {
"jmx": "http://node5.fqdn.com:8480/jmx"
},
"NODEMANAGER": {
"jmx": "http://node5.fqdn.com:8042/jmx"
}
}
},
{
"node6.fqdn.com": {
"DATANODE": {
"jmx": "http://node6.fqdn.com:1022/jmx"
},
"HBASE_MASTER": {
"jmx": "http://node6.fqdn.com:16010/jmx"
},
"HBASE_REGIONSERVER": {
"jmx": "http://node6.fqdn.com:60030/jmx"
},
"JOURNALNODE": {
"jmx": "http://node6.fqdn.com:8480/jmx"
},
"NODEMANAGER": {
"jmx": "http://node6.fqdn.com:8042/jmx"
},
"RESOURCEMANAGER": {
"jmx": "http://node6.fqdn.com:8088/jmx"
}
}
}
]
}
Then you can run:
# -s means the rest api or webhook url mentioned above, should be in <host:port> format, no schema and path( I know it's ugly).
# -P (upper) means hadoop_exporter should export metrics in this port. you can get metrics from <http://hostname:9131/metrics>
python hadoop_exporter.py -s "<rest_api_host_and_port>" -P 9131
One more thing: you should run all this steps in all hadoop nodes.
MAYBE I'll improve this project for common use.