mesos_exporter icon indicating copy to clipboard operation
mesos_exporter copied to clipboard

Prototype for adding Prometheus service discovery

Open maurorappa opened this issue 6 years ago • 2 comments

I think it could be useful for others having a Prometheus Service Discovery endpoint. The idea behind is having an output ready to be used for Prometheus (https://www.robustperception.io/using-json-file-service-discovery-with-prometheus/); in such a way if the cluster has dynamic members joining, you can have them monitored with no human intervention. All you need is periodically poll this exporter ( on all masters), save on file if the output is not 'null' and configure Prometheus to reads this file. THIS IS A PROTOTYPE to start a discussion about this enhancement, it needs to modified in order to :

  • written in nice Golang :)
  • use any json library instead of creating a string
  • use command line arguments instead of Environment variables
  • decide what to do on a slave
  • have some testing code
  • have the related code outside the main file

maurorappa avatar May 25 '18 10:05 maurorappa

Oh I'm a fan of this feature! I'm currently using a very half-assed SRV record to json generator I built a while ago (https://github.com/lloesche/prometheus-dcos/blob/master/srv2file_sd.go) but it depends on DC/OS or rather mesos-dns to discover all the Mesos agents. Getting rid of that dependency and using Mesos' /state.json alone for service discovery would be very nice.

That said, I wonder about the static port. I might want to poll several exporters (e.g. node_exporter, cAdvisor, mesos_exporter) on an agent. So maybe it'd be better to leave the port out (or optional) and just return a json of the agent nodes and then in the process where you curl the /sd api maybe add a simple jq to add ports for any exporter you're interested in. Also, the way it works on our clusters we usually run node_exporter on random, Mesos assigned ports and use relabeling to fake it back to 9100 so metrics are associated with the correct instance/time series.

Edit: one of our Mesos developers just told me that /state.json is not a good endpoint to query for this use case and we should check if we can get the required info from /state-summary. Depending on the size of the cluster and framework/task history size querying /state.json can freeze the master for a while.

lloesche avatar Jun 07 '18 17:06 lloesche

all you mentioned can be changed, my idea was to show some potential new functionality we could introduce. I'll amend the static port, for the API endpoint I need to see the format..

maurorappa avatar Jun 13 '18 07:06 maurorappa