graphios icon indicating copy to clipboard operation
graphios copied to clipboard

Graphios not parsing spaces in perfdata (?)

Open christrotter opened this issue 10 years ago • 14 comments

Shawn, I'd sent you an email about this...but I got a little farther...

failed to parse label: 'physical' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'virtual' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'memory' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'virtual' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'paged' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'bytes' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'paged' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'

I'm now seeing metrics into Graphite, but anything Windows-related (check_NRPE?) (has a space in the perfdata label?) is broken.

I am guessing it's part of this:

 for metric in mobj.PERFDATA.split():
                try:
                    nobj = copy.copy(mobj)
                    (nobj.LABEL, d) = metric.split('=')
                    v = d.split(';')[0]
                    u = v
                    nobj.VALUE = re.sub("[a-zA-Z%]", "", v)
                    nobj.UOM = re.sub("[^a-zA-Z]+", "", u)
                    processed_objects.append(nobj)
                except:
                    log.critical("failed to parse label: '%s' part of perf"
                                 "string '%s'" % (metric, nobj.PERFDATA))
                    continue

Here's some relevant raw spool/graphios perfdata:

DATATYPE::SERVICEPERFDATA       TIMET::1418912534       HOSTNAME::servernameHere  SERVICEDESC::Memory Load        SERVICEPERFDATA::physical memory %=30%;80;90 physical memory=1.201G;3.19899;3.599;0;3.999 virtual memory %=0%;80;90 virtual memory=353.613M;6710886.3;7549747.087;0;8388607.875 paged bytes %=10%;80;90 paged bytes=1.01899G;8.13499;9.15199;0;10.168 page file %=10%;80;90 page file=1.01899G;8.13499;9.15199;0;10.168       SERVICECHECKCOMMAND::check_nrpe!alias_mem       HOSTSTATE::UP   HOSTSTATETYPE::HARD     SERVICESTATE::OK        SERVICESTATETYPE::HARD GRAPHITEPREFIX::nagios.01.service       GRAPHITEPOSTFIX::$_SERVICEGRAPHITEPOSTFIX$

...because there are spaces in the perfdata labels?

When I look at the Graphite metric names...clearly the delimiter is breaking. e.g.

nagios.01.service.servernameHere.30s

My Nagios template definitions have: (respectively)

        _graphiteprefix                 nagios.01.host
        _graphiteprefix                 nagios.01.service

I found a setting that allows me to get around this for the time being...

# use service description, most people will NOT want this, read documentation!
use_service_desc = True

Now I see valid metrics & selections coming up in Graphite...Shawn, if you have any ideas how to fix the template stuff, that'd be great. If not, this will work, too!

christrotter avatar Dec 18 '14 15:12 christrotter

Thank you for the detailed data (so refreshing when I don't have to ask for more).

According to the nagios documentation the perfdata is supposed to be a "space separated list of label/value pairs" . So the nagios plugin you are using isn't formatting it's perfdata correctly. Which one is it out of curiosity or is it home grown?

If I take the perfstring: physical memory %=30%;80;90 physical memory=1.201G;3.19899;3.599;0;3.999 virtual memory %=0%;80;90 virtual memory=353.613M;6710886.3;7549747.087;0;8388607.875 paged bytes %=10%;80;90 paged bytes=1.01899G;8.13499;9.15199;0;10.168 page file %=10%;80;90 page file=1.01899G;8.13499;9.15199;0;10.168

I can't just substitute whitespace with "_" because sometimes the label has one space like "physical memory" and sometimes it has two spaces like "physical memory %". Not to mention I now have no idea where the next label starts.

It might be possible to make a function that scans for an '=' then takes everything pre the '=' as a label, replace the whitespace in that, then take everything post the '=' until the next space. I imagine it would be easier to fix the plugin to adhere to the nagios plugin guidelines.

You are correct in the code that you identified as the problem, and the problem is in the first line. We are splitting the perfstring by ' ' (which is the default of split()) then separate the label from the values by splitting by the '='.

This isn't the first time this has happened (it's the second), so it might be worth while to change the code to handle this type of perfstring, like maybe it's a commercial plugin that can't be fixed.

Let me know if the plugin can't be fixed. It might be fun to write that function.

-Shawn

shawn-sterling avatar Dec 18 '14 23:12 shawn-sterling

Aha, that makes sense. This perf data is coming from NSclient++ (0.4.1) & check_nrpe. Here's some example nsclient.ini lines...

[/settings/external scripts/alias]
alias_cpu = checkCPU warn=80 crit=90 time=5m time=1m time=30s
alias_disk = CheckDriveSize MinWarn=10% MinCrit=5% CheckAll FilterType=FIXED
alias_mem = checkMem MaxWarn=80% MaxCrit=90% ShowAll=long type=physical type=virtual type=paged type=page

I wouldn't go nuts on this unless other people really want it. We've moved away from NSClient++ as a Windows agent to Check_MK (thank you, OMD!) - I'm testing that out soon. I was putting Graphios on our 'old' Nagios server just to see if it'd work. Was nice to see OMD-related documentation from you!

christrotter avatar Dec 19 '14 13:12 christrotter

Shawn, as a sort of continuum to this - I've now gotten Graphios running on our OMD 1.20 box (thanks to your instructions!) and perfdata is flowing. Yay.

The only snag (and I believe the metrics are still flowing...so not a show-stopper) is that our custom monitors are throwing parse errors:

December 30 07:33:19 graphios.py CRITICAL failed to parse label: '[cscript.exe]' part of perfstring 'Metric=5;-1;;; [cscript.exe]'

Looking at the actual command-line output (it's a VBS file that queries a SQL DB, then formats in a Nagios-acceptable way), I see this:

OK|'Metric'=5;-1;

Looking at the Check_MK service details page, I see this:

Service performance data       'Metric'=5;1;;; [cscript.exe]

So to me, that says that OMD/Check_MK is adding in that [cscript.exe] piece. If you look at the actual script execution, it's like this:

C:\Program Files (x86)\check_mk>cscript.exe //nologo "C:\program files (x86)\check_mk\plugins\scriptFile.vbs"
OK|'Metric'=5;-1;

Anyways, I'm not too bothered by it because the metrics are still getting to Graphite, but it does mean the logfile gets chunky.

christrotter avatar Dec 30 '14 12:12 christrotter

Hrm. Negative values should be ok, and that perfstring format looks good. So this is a bug. I will fix this shortly.

shawn-sterling avatar Jan 20 '15 08:01 shawn-sterling

I misinterpreted this. I tested with negative values and that's all good.

The question is, why the hell is check_mk seemingly adding ' [cscript.exe]' to the results?

This would require me to get check_mk and a windows box up and running to replicate.. That's a tall order. :) hehe.

If you want to just get rid of the error in the logs you could do a quick:

self.PERFDATA = self.PERFDATA.replace('[cscript.exe]', '')

in the validate function around line 143 of graphios.py and that will bandaid your issue. But I would look more into the check_mk configuration because that doesn't sound right.

shawn-sterling avatar Jan 24 '15 08:01 shawn-sterling

I have updated the code (on github only) which should fix the single quotes in label part of the problem. Can you try this and let me know how it works? (branch: parserfix).

shawn-sterling avatar Mar 09 '15 01:03 shawn-sterling

I'm having similar difficulty (just checked out the latest from github) The output from running my plugin running on a linux box reads: OK Current SMTP CONNECTIONS=1766 | 'SMTP CONNECTIONS'=1766;7000;10000;;

I get errors in my graphios log saying: failed to parse label: ''SMTP' part of perfstring ''SMTP CONNECTIONS'=1766;7000;10000;;'

fuqqer avatar Mar 24 '15 18:03 fuqqer

@fuqqer Can you try the code in parserfix branch, and let me know if that solves your problem?

shawn-sterling avatar Mar 24 '15 18:03 shawn-sterling

Same space issue here, god damn windows boxes :)

failed to parse label: 'Used' part of perfstring 'd:\ Used Space=75.48Gb;90.00;95.00;0.00;100.00'
failed to parse label: 'Memory' part of perfstring 'Memory usage=19730.51Mb;22372.50;23314.50;0.00;23550.00'
failed to parse label: 'h:\' part of perfstring 'h:\ Used Space=21.14Gb;45.00;47.50;0.00;50.00'
failed to parse label: 'Used' part of perfstring 'h:\ Used Space=21.14Gb;45.00;47.50;0.00;50.00'
failed to parse label: '5' part of perfstring '5 min avg Load=59%;80;90;0;100'
failed to parse label: 'min' part of perfstring '5 min avg Load=59%;80;90;0;100'
failed to parse label: 'avg' part of perfstring '5 min avg Load=59%;80;90;0;100'

argais avatar Jun 15 '15 19:06 argais

argais: that was on the parserfix branch? If so, can you email me a small snippet of nagios perfdata to test with?

shawn-sterling avatar Jun 23 '15 08:06 shawn-sterling

I didnt have the chance to try the parserfix branch yet. Once I do Ill let you know.

argais avatar Jun 25 '15 12:06 argais

Does use_service_desc = True work as a workaround?

elvarb avatar Mar 14 '16 15:03 elvarb

It doesn't seem that anyone's confirmed that the parserfix branch is working, and the issue still exists in the master branch, so just wanted to confirm that it does indeed work.

That said, I also found that special characters were causing me a lot of issues. I edited the script to replace them (I'm using Thruk so no issues with action_urls) and now have this working fairly flawlessly on a 14,000 services Nagios installation.

Serration avatar Apr 15 '16 04:04 Serration

Thanks for the fix, I can also confirm that it works. (Even rebased to master.)

PAStheLoD avatar Oct 10 '16 09:10 PAStheLoD