graphios icon indicating copy to clipboard operation
graphios copied to clipboard

Allow metrics to be written via python plugin

Open cdosborn opened this issue 9 years ago • 2 comments

Hey Shawn,

My name is Connor. I am an intern at iPlant Collaborative, we're an NSF funded organization specializing in bio-cyber-infrastructure to democratize access to U.S. supercomputing capabilities. (we help scientists leverage the cloud)

With my monitoring setup, I have a nagios plugin reporting metrics from a hypervisor. As a result the output concerns all the hypervisor's vm's. Instead of returning metrics for a single host, I'm returning stats for multiple hosts. Here's sample output from my plugin:

virt-stats OK - 2 running of 7 on <hypervisor> | <vm 1>=<cpu>;<mem>;<tx>;<rx>; <vm 2> ...

The metrics I would like to send to carbon:

stats.<hypervisor>.<vm 1>.cpu
...
stats.<hypervisor>.<vm 2>.cpu
...

Currently Graphios doesn't permit this flexibility, which is understandable. I need a dynamic graphite prefix.

I have implemented this behavior. The user simply has to write a python module defining get_metrics. The user is responsible for ensuring that the returned path is valid for their given backend. Here's a sample plugin:

def get_metrics(perfdata, nag):
    """ 
        returns a [(<path>, <value>)] where each is the metric to send to carbon
    """

    path = "%s.%s.%s.%s" % (nag.GRAPHITEPREFIX, nag.HOSTNAME, nag.GRAPHITEPOSTFIX, nag.LABEL)
    value = nag.VALUE

    return [ (path, value) ]

The nagios service definition looks like so:

define service {
     ...
     _graphiteplugin /path/to/plugin.py
     ...
}

So far it has been tested with carbon. If this behavior looks like something you want to be a part of graphios, let me know what else you'd like to see.

cdosborn avatar Jun 15 '15 18:06 cdosborn

Hi there. First of all, I think it's pretty badass that your company is letting you work on open source, and second of all I think it's also pretty badass that you are actually doing it. :)

I don't mind this idea, but I think you might run into problems down the road with nagios depending on what you are using to receive data. nrpe and nsca have tiny payload sizes for perfdata by default; you can modify the source and increase to 4096 bytes, but I've run into problems with that as well. If you are using nrdp this shouldn't be a problem.

I think a better way to introduce this would be as a config file option; then the codepath would could be skipped unless it was purposefully enabled.

The loading of the modules will likely need some more exceptions added, specifically IOError and OSError in case the permissions are messed up for the module.

The code at line 359 -> 363 assumes the data returned is in the format you expect, this may cause problems if it isn't.

removing the clearing of whitespace you did on what was line 387 is likely detrimental for most people running this. There is also another merge I'm going to do (parserfix) that will be cleaning up any spaces in the perfdata (windows plugins and their spaces). Not sure what the right solution is here, you may consider something else to separate entries (space may work for you but won't for others).

Lastly grab the python module flake8 if you want travisci to stop complaining. :)

Have a good day.

-Shawn

shawn-sterling avatar Jun 23 '15 08:06 shawn-sterling

Config is moved out of nagios definitions. (Do you want command line opts for this?)

I added better error handling reporting, 'missing path/missing method/see example'.

In terms of the white-space substitution, I changed it so that if you go the plugin route no substitutions are made. If you're writing a plugin you have complete flexibility (and complete freedom to shoot yourself in the foot).

This can easily be included with other backends. From a cursory look, librato, influxdb would use the same fix as carbon (perhaps a case for base classing some of the backends). I don't have a convenient way to test those systems.

Should be flake8 compatible :).

cdosborn avatar Jul 01 '15 18:07 cdosborn