garmadon
garmadon copied to clipboard
Retreive containers metrics per applications
Time to times we are seeing some applications requested lots of containers (up to 20 millions) from different frameworks tez, flink. This leads to lots of pending containers on the cluster and are usually due to bad request or bugs. It is not very easy to find which application is the root cause of this high containers request, only debug log level on org.apache.hadoop.yarn.server.resourcemanager.scheduler package helps to find the application. It will be much easier to have garmadon reporting different metrics about containers (running, pending...) from each app and then display aa top 10 of app with pending containers in compute grafana dashboards