nagios-plugin-check_raid icon indicating copy to clipboard operation
nagios-plugin-check_raid copied to clipboard

maxcache monitoring

Open arekm opened this issue 7 years ago • 6 comments

Feature request about monitoring of maxcache status. It's a cache that uses SSD disks.

Probably it's enough to monitor this part:

NOT PARSED: [maxCache 3.0 information] [   Status of maxCache                       : Optimal] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1179.

and maybe

NOT PARSED: [maxCache 3.0 information] [   Failed stripes                           : No] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1185.

-->

Full unparsed part:

NOT PARSED: [maxCache 3.0 information] [maxCache device number 100] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1173.
NOT PARSED: [maxCache 3.0 information] [   maxCache device name                     : MaxCache 0] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1174.
NOT PARSED: [maxCache 3.0 information] [   Block Size of member drives              : 512 Bytes] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1175.
NOT PARSED: [maxCache 3.0 information] [   maxCache Dirty Status                    : Clean] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1176.
NOT PARSED: [maxCache 3.0 information] [   RAID level                               : 5] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1177.
NOT PARSED: [maxCache 3.0 information] [   Background Coherency Check               : Inactive] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1178.
NOT PARSED: [maxCache 3.0 information] [   Status of maxCache                       : Optimal] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1179.
NOT PARSED: [maxCache 3.0 information] [   Size                                     : 381190 MB] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1180.
NOT PARSED: [maxCache 3.0 information] [   Additional details                       : Initialized with Build/Clear] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1181.
NOT PARSED: [maxCache 3.0 information] [   Stripe-unit size                         : 1024 KB] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1182.
NOT PARSED: [maxCache 3.0 information] [   maxCache write cache status              : On] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1183.
NOT PARSED: [maxCache 3.0 information] [   Protected by Hot-Spare                   : No] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1184.
NOT PARSED: [maxCache 3.0 information] [   Failed stripes                           : No] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1185.
NOT PARSED: [maxCache 3.0 information] [   Segment 0                                : Present (190782MB, SATA, SSD, Connector:1, Device:0) BTTV605101CU200GGN] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1189.
NOT PARSED: [maxCache 3.0 information] [   Segment 1                                : Present (190782MB, SATA, SSD, Connector:1, Device:1) BTTV605101GX200GGN] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1190.
NOT PARSED: [maxCache 3.0 information] [   Segment 2                                : Present (190782MB, SATA, SSD, Connector:1, Device:2) BTTV605100CG200GGN] at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugins/arcconf.pm line 263, <$fh> line 1191.

arekm avatar Feb 01 '17 11:02 arekm

please provide debug output of the commands for testing

https://github.com/glensc/nagios-plugin-check_raid/blob/master/CONTRIBUTING.md#reporting-issues https://github.com/glensc/nagios-plugin-check_raid/blob/master/ISSUE_TEMPLATE.md

glensc avatar Mar 27 '17 19:03 glensc

hi, I've bumped into the same issue, the NOT PARSED lines are coming from the arcconf GETCONFIG 1 AL exec, here is the relevant output(the full output is fairly large, 11598 lines):

      Device #240
         Device is an Enclosure Services Device
         Reported Channel,Device(T:L)       : 2,11(11:0)
         Enclosure ID                       : 11
         Expander ID                        : 11
         Enclosure Logical Identifier       : 5003048000FB16BF
         Type                               : SES2
         Vendor                             : LSI
         Model                              : SAS2X36
         Firmware                           : 0e12
         Status of Enclosure Services Device
            Fan 1 status                    : 6000 rpm (Optimal)
            Fan 2 status                    : 6060 rpm (Optimal)
            Fan 3 status                    : 6060 rpm (Optimal)
            Fan 4 status                    : 5640 rpm (Optimal)
            Power supply 1 status           : Not Available
            Power supply 2 status           : Not Available
            Temperature Sensor Status 1     : 27 C/ 80 F (Normal)
            Speaker status                  : Not Available

----------------------------------------------------------------------
maxCache 3.0 information
----------------------------------------------------------------------
maxCache device number 100
   maxCache device name                     : MaxCache
   Block Size of member drives              : 512 Bytes
   maxCache Dirty Status                    : Clean
   RAID level                               : 0
   Background Coherency Check               : Inactive
   Status of maxCache                       : Optimal
   Size                                     : 1525750 MB
   Stripe-unit size                         : 1024 KB
   maxCache write cache status              : Off (Non Redundant maxCache Container)
   Protected by Hot-Spare                   : No
   Failed stripes                           : No
   --------------------------------------------------------
   Logical Device segment information
   --------------------------------------------------------
   Segment 0                                : Present (763097MB, SATA, SSD, Enclosure:0, Slot:6) BTWL33910AJX800RGN
   Segment 1                                : Present (763097MB, SATA, SSD, Enclosure:0, Slot:7) BTWL33900C77800RGN



----------------------------------------------------------------------
Connector information
----------------------------------------------------------------------
Connector #0
   Connector Name                           : CN0

Tyrael avatar Sep 28 '17 11:09 Tyrael

please include ALL output, do not truncate at your will. will be used to seed tests. see CONTRIBUTING.md

glensc avatar Sep 28 '17 20:09 glensc

I've seen the CONTRIBUTING.md but pasting 16k lines which mostly irrelevant sounds bad. I can look into cooking up a pull request if that's easier for you.

Tyrael avatar Sep 28 '17 21:09 Tyrael

the point of the output is that when changes are made to code-base, existing systems can stay working. for example parser is changed. you can't possibly predict which information is relevant or not. you may obfuscate values of course. you may think omitting repeated values can be omitted, but then different output is related together you create broken relations if you did not account that value X in block A may be related to value Y in block B.

glensc avatar Sep 29 '17 15:09 glensc

# /usr/lib/nagios/plugins/check_raid -d
check_raid 4.0.8-dev
Visit <https://github.com/glensc/nagios-plugin-check_raid#reporting-bugs> how to report bugs
Please include output of **ALL** commands in bugreport

DEBUG EXEC: /sbin/dmsetup status --noflush at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
DEBUG EXEC: /proc/mdstat at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
DEBUG EXEC: /sbin/arcconf GETSTATUS 1 at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
DEBUG EXEC: /sbin/arcconf GETCONFIG 1 AL at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
DEBUG EXEC: /proc/mdstat at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.
OK: arcconf:[Controller:Optimal, Logical Device 0(STORAGE1):Optimal, Logical Device 1(STORAGE2):Optimal, Drives: BTTV605101CU200GGN,BTTV605101GX200GGN,BTTV605100CG200GGN,NAHGY61X,NAGDP1WX,NCH2GD0Z,NCH1R41Z,NCGZJYVV,NCH2EX7Z,NAG2Y0LY,NCH2G2RZ,NCGZBTHV,NAGUKP3X,NAG41A7Y,NCH1S5MZ,NCH1S6KZ,NCH2EX3Z,NCH2G8MZ,NCGZEG3V,NCH2K6VZ,NCH2K32Z=Online]; mdstat:[md2(767.94 MiB raid1):UU, md3(185.42 GiB raid1):UU]

In above

DEBUG EXEC: /proc/mdstat at /usr/share/perl5/vendor_perl/App/Monitoring/Plugin/CheckRaid/Plugin.pm line 385.

looks weird. Shouldn't be that some "cat /proc/mdstat"?

Anyway back to the issue: raid-mdstat.txt raid-arcconf-getconfig.txt raid-arcconf-getstatus.txt raid-dmsetup.txt

arekm avatar Dec 11 '18 07:12 arekm