graphios
graphios copied to clipboard
Graphios not parsing spaces in perfdata (?)
Shawn, I'd sent you an email about this...but I got a little farther...
failed to parse label: 'physical' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'virtual' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'memory' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'virtual' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'paged' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'bytes' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
failed to parse label: 'paged' part of perfstring 'physical memory %=23%;80;90 physical memory=955.941M;3276.441;3685.996;0;4095.551 virtual memory %=0%;80;90 virtual memory=355.348M;6710886.3;7549747.087;0;8388607.875 paged bytes %=18%;80;90 paged bytes=886.578M;3839.641;4319.596;0;4799.551 page file %=18%;80;90 page file=886.578M;3839.641;4319.596;0;4799.551'
I'm now seeing metrics into Graphite, but anything Windows-related (check_NRPE?) (has a space in the perfdata label?) is broken.
I am guessing it's part of this:
for metric in mobj.PERFDATA.split():
try:
nobj = copy.copy(mobj)
(nobj.LABEL, d) = metric.split('=')
v = d.split(';')[0]
u = v
nobj.VALUE = re.sub("[a-zA-Z%]", "", v)
nobj.UOM = re.sub("[^a-zA-Z]+", "", u)
processed_objects.append(nobj)
except:
log.critical("failed to parse label: '%s' part of perf"
"string '%s'" % (metric, nobj.PERFDATA))
continue
Here's some relevant raw spool/graphios perfdata:
DATATYPE::SERVICEPERFDATA TIMET::1418912534 HOSTNAME::servernameHere SERVICEDESC::Memory Load SERVICEPERFDATA::physical memory %=30%;80;90 physical memory=1.201G;3.19899;3.599;0;3.999 virtual memory %=0%;80;90 virtual memory=353.613M;6710886.3;7549747.087;0;8388607.875 paged bytes %=10%;80;90 paged bytes=1.01899G;8.13499;9.15199;0;10.168 page file %=10%;80;90 page file=1.01899G;8.13499;9.15199;0;10.168 SERVICECHECKCOMMAND::check_nrpe!alias_mem HOSTSTATE::UP HOSTSTATETYPE::HARD SERVICESTATE::OK SERVICESTATETYPE::HARD GRAPHITEPREFIX::nagios.01.service GRAPHITEPOSTFIX::$_SERVICEGRAPHITEPOSTFIX$
...because there are spaces in the perfdata labels?
When I look at the Graphite metric names...clearly the delimiter is breaking. e.g.
nagios.01.service.servernameHere.30s
My Nagios template definitions have: (respectively)
_graphiteprefix nagios.01.host
_graphiteprefix nagios.01.service
I found a setting that allows me to get around this for the time being...
# use service description, most people will NOT want this, read documentation!
use_service_desc = True
Now I see valid metrics & selections coming up in Graphite...Shawn, if you have any ideas how to fix the template stuff, that'd be great. If not, this will work, too!
Thank you for the detailed data (so refreshing when I don't have to ask for more).
According to the nagios documentation the perfdata is supposed to be a "space separated list of label/value pairs" . So the nagios plugin you are using isn't formatting it's perfdata correctly. Which one is it out of curiosity or is it home grown?
If I take the perfstring: physical memory %=30%;80;90 physical memory=1.201G;3.19899;3.599;0;3.999 virtual memory %=0%;80;90 virtual memory=353.613M;6710886.3;7549747.087;0;8388607.875 paged bytes %=10%;80;90 paged bytes=1.01899G;8.13499;9.15199;0;10.168 page file %=10%;80;90 page file=1.01899G;8.13499;9.15199;0;10.168
I can't just substitute whitespace with "_" because sometimes the label has one space like "physical memory" and sometimes it has two spaces like "physical memory %". Not to mention I now have no idea where the next label starts.
It might be possible to make a function that scans for an '=' then takes everything pre the '=' as a label, replace the whitespace in that, then take everything post the '=' until the next space. I imagine it would be easier to fix the plugin to adhere to the nagios plugin guidelines.
You are correct in the code that you identified as the problem, and the problem is in the first line. We are splitting the perfstring by ' ' (which is the default of split()) then separate the label from the values by splitting by the '='.
This isn't the first time this has happened (it's the second), so it might be worth while to change the code to handle this type of perfstring, like maybe it's a commercial plugin that can't be fixed.
Let me know if the plugin can't be fixed. It might be fun to write that function.
-Shawn
Aha, that makes sense. This perf data is coming from NSclient++ (0.4.1) & check_nrpe. Here's some example nsclient.ini lines...
[/settings/external scripts/alias]
alias_cpu = checkCPU warn=80 crit=90 time=5m time=1m time=30s
alias_disk = CheckDriveSize MinWarn=10% MinCrit=5% CheckAll FilterType=FIXED
alias_mem = checkMem MaxWarn=80% MaxCrit=90% ShowAll=long type=physical type=virtual type=paged type=page
I wouldn't go nuts on this unless other people really want it. We've moved away from NSClient++ as a Windows agent to Check_MK (thank you, OMD!) - I'm testing that out soon. I was putting Graphios on our 'old' Nagios server just to see if it'd work. Was nice to see OMD-related documentation from you!
Shawn, as a sort of continuum to this - I've now gotten Graphios running on our OMD 1.20 box (thanks to your instructions!) and perfdata is flowing. Yay.
The only snag (and I believe the metrics are still flowing...so not a show-stopper) is that our custom monitors are throwing parse errors:
December 30 07:33:19 graphios.py CRITICAL failed to parse label: '[cscript.exe]' part of perfstring 'Metric=5;-1;;; [cscript.exe]'
Looking at the actual command-line output (it's a VBS file that queries a SQL DB, then formats in a Nagios-acceptable way), I see this:
OK|'Metric'=5;-1;
Looking at the Check_MK service details page, I see this:
Service performance data 'Metric'=5;1;;; [cscript.exe]
So to me, that says that OMD/Check_MK is adding in that [cscript.exe] piece. If you look at the actual script execution, it's like this:
C:\Program Files (x86)\check_mk>cscript.exe //nologo "C:\program files (x86)\check_mk\plugins\scriptFile.vbs"
OK|'Metric'=5;-1;
Anyways, I'm not too bothered by it because the metrics are still getting to Graphite, but it does mean the logfile gets chunky.
Hrm. Negative values should be ok, and that perfstring format looks good. So this is a bug. I will fix this shortly.
I misinterpreted this. I tested with negative values and that's all good.
The question is, why the hell is check_mk seemingly adding ' [cscript.exe]' to the results?
This would require me to get check_mk and a windows box up and running to replicate.. That's a tall order. :) hehe.
If you want to just get rid of the error in the logs you could do a quick:
self.PERFDATA = self.PERFDATA.replace('[cscript.exe]', '')
in the validate function around line 143 of graphios.py and that will bandaid your issue. But I would look more into the check_mk configuration because that doesn't sound right.
I have updated the code (on github only) which should fix the single quotes in label part of the problem. Can you try this and let me know how it works? (branch: parserfix).
I'm having similar difficulty (just checked out the latest from github) The output from running my plugin running on a linux box reads: OK Current SMTP CONNECTIONS=1766 | 'SMTP CONNECTIONS'=1766;7000;10000;;
I get errors in my graphios log saying: failed to parse label: ''SMTP' part of perfstring ''SMTP CONNECTIONS'=1766;7000;10000;;'
@fuqqer Can you try the code in parserfix branch, and let me know if that solves your problem?
Same space issue here, god damn windows boxes :)
failed to parse label: 'Used' part of perfstring 'd:\ Used Space=75.48Gb;90.00;95.00;0.00;100.00'
failed to parse label: 'Memory' part of perfstring 'Memory usage=19730.51Mb;22372.50;23314.50;0.00;23550.00'
failed to parse label: 'h:\' part of perfstring 'h:\ Used Space=21.14Gb;45.00;47.50;0.00;50.00'
failed to parse label: 'Used' part of perfstring 'h:\ Used Space=21.14Gb;45.00;47.50;0.00;50.00'
failed to parse label: '5' part of perfstring '5 min avg Load=59%;80;90;0;100'
failed to parse label: 'min' part of perfstring '5 min avg Load=59%;80;90;0;100'
failed to parse label: 'avg' part of perfstring '5 min avg Load=59%;80;90;0;100'
argais: that was on the parserfix branch? If so, can you email me a small snippet of nagios perfdata to test with?
I didnt have the chance to try the parserfix branch yet. Once I do Ill let you know.
Does use_service_desc = True work as a workaround?
It doesn't seem that anyone's confirmed that the parserfix branch is working, and the issue still exists in the master branch, so just wanted to confirm that it does indeed work.
That said, I also found that special characters were causing me a lot of issues. I edited the script to replace them (I'm using Thruk so no issues with action_urls) and now have this working fairly flawlessly on a 14,000 services Nagios installation.
Thanks for the fix, I can also confirm that it works. (Even rebased to master.)