check_nwc_health
check_nwc_health copied to clipboard
some checks fails @ SRX1400
Hello,
we use check_nwc_health $Revision: 5.11.3
--mode cpu-load fails with a strage error (please note timeout 120, if we use 60 we get only a timeout)
X@XJ:~$ /usr/lib/nagios/plugins/check_nwc_health -vvvvvvvvvvv --mode cpu-load --hostname 10.X.33 --protocol 3 --username X --authpassword X --authprotocol SHA1 --privpassword X --privprotocol AES --timeout 120
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Device::check_messages
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.3.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysUpTime (1.3.6.1.2.1.1.3.0) : 2083587759
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SNMPFRAMEWORKMIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.6.3.10.2.1.3.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SNMPFRAMEWORKMIB
Thu Jan 26 14:40:01 2017: GET: SNMP-FRAMEWORK-MIB::snmpEngineTime (1.3.6.1.6.3.10.2.1.3.0) : 20835877
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.1.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysDescr (1.3.6.1.2.1.1.1.0) : Juniper Networks, Inc. srx1400 internet router, kernel JUNOS 12.1X46-D40.2 #0: 2015-09-26 03:22:03 UTC [email protected]:/volume/build/junos/12.1/service/12.1X46-D40.2/obj-powerpc/junos/bsd/kernels/JUNIPER-SRX/kernel Build date: 2015-09-26
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.2.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2.0) : 1.3.6.1.4.1.2636.1.1.1.2.49
Thu Jan 26 14:40:01 2017: uptime: 20835877
Thu Jan 26 14:40:01 2017: up since: Mon May 30 11:55:24 2016
Thu Jan 26 14:40:01 2017: whoami: Juniper Networks, Inc. srx1400 internet router, kernel JUNOS 12.1X46-D40.2 #0: 2015-09-26 03:22:03 UTC [email protected]:/volume/build/junos/12.1/service/12.1X46-D40.2/obj-powerpc/junos/bsd/kernels/JUNIPER-SRX/kernel Build date: 2015-09-26
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Device::check_messages
I am a Juniper Networks, Inc. srx1400 internet router, kernel JUNOS 12.1X46-D40.2 #0: 2015-09-26 03:22:03 UTC [email protected]:/volume/build/junos/12.1/service/12.1X46-D40.2/obj-powerpc/junos/bsd/kernels/JUNIPER-SRX/kernel Build date: 2015-09-26
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SYNOPTICSROOTMIB
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.2.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2.0) : 1.3.6.1.4.1.2636.1.1.1.2.49
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.2.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2.0) : 1.3.6.1.4.1.2636.1.1.1.2.49
Thu Jan 26 14:40:01 2017: implements JUNIPER-MIB (found traces)
Thu Jan 26 14:40:01 2017: using Classes::Juniper::SRX
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Juniper::SRX::override_opt
Thu Jan 26 14:40:01 2017: AUTOLOAD Monitoring::GLPlugin::Commandline::override_opt
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Juniper::SRX::check_messages
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Juniper::SRX::analyze_and_check_cpu_subsystem
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB
Thu Jan 26 14:40:01 2017: get_snmp_table_objects JUNIPER-MIB jnxOperatingTable
Thu Jan 26 14:40:01 2017: get_snmp_table_objects calls get_table 1.3.6.1.4.1.2636.3.1.13
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.4.1.2636.3.1.13
Thu Jan 26 14:40:01 2017: get_table $VAR1 = {
'-baseoid' => '1.3.6.1.4.1.2636.3.1.13'
};
Thu Jan 26 14:40:17 2017: get_table returned 0 oids
Thu Jan 26 14:40:17 2017: get_table error: No response from remote host "10.X.33"
Thu Jan 26 14:40:17 2017: get_table error: try fallback
Thu Jan 26 14:40:17 2017: get_table $VAR1 = {
'-maxrepetitions' => 1,
'-baseoid' => '1.3.6.1.4.1.2636.3.1.13'
};
Thu Jan 26 14:41:10 2017: get_table returned 720 oids
Thu Jan 26 14:41:10 2017: get_matching_oids $VAR1 = {
'-columns' => [
'1.3.6.1.4.1.2636.3.1.13'
]
};
Thu Jan 26 14:41:10 2017: get_matching_oids returns 720 from 724 oids
Thu Jan 26 14:41:10 2017: get_snmp_table_objects get_table returns 720 oids
Thu Jan 26 14:41:10 2017: get_snmp_table_objects get_table returns 30 indices
Thu Jan 26 14:41:10 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB
Thu Jan 26 14:41:10 2017: $self->{components}->{cpu_subsystem} = Classes::Juniper::SRX::Component::CpuSubsystem->new()
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::check_cpu_subsystem
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::set_thresholds
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::check_thresholds
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::add_perfdata
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::set_thresholds
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::check_thresholds
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::add_perfdata
[CPUSUBSYSTEM]
info: checking operatins
[OPERATINGITEM_9.1.0.0]
jnxOperatingBuffer: 45
jnxOperatingCPU: 19
jnxOperatingChassisDescr: node0
jnxOperatingChassisId: node0
jnxOperatingContentsIndex: 9
jnxOperatingDRAMSize: 1072693248
jnxOperatingDescr: node0 Routing Engine 0
jnxOperatingHeap: 0
jnxOperatingISR: 0
jnxOperatingL1Index: 1
jnxOperatingL2Index: 0
jnxOperatingL3Index: 0
jnxOperatingLastRestart: 0
jnxOperatingMemory: 1023
jnxOperatingRestartTime: 5,+
jnxOperatingState: running
jnxOperatingStateOrdered: running
jnxOperatingTemp: 0
jnxOperatingUpTime: 2083608300
info: node0 Routing Engine 0 cpu usage is 19.00%
[OPERATINGITEM_9.3.0.0]
jnxOperatingBuffer: 42
jnxOperatingCPU: 15
jnxOperatingChassisDescr: node1
jnxOperatingChassisId: node1
jnxOperatingContentsIndex: 9
jnxOperatingDRAMSize: 1072693248
jnxOperatingDescr: node1 Routing Engine 0
jnxOperatingHeap: 0
jnxOperatingISR: 0
jnxOperatingL1Index: 3
jnxOperatingL2Index: 0
jnxOperatingL3Index: 0
jnxOperatingLastRestart: 0
jnxOperatingMemory: 1023
jnxOperatingRestartTime: #+
++|O-e_a+i+gS+a+e: _+++i+g
++|O-e_a+i+gS+a+eO_de_ed: _+++i+g
++|O-e_a+i+gTe+-: 0
++|O-e_a+i+gU-Ti+e: 2082993000
i+f-: +-de1 R-++i+g E+gi+e 0 c-+ +_age i_ 15.00%
Th+ Ja+ 26 14:41:10 2017: AUTOLOAD C+a__e_::J++i-e_::SRX::chec+_+e__age_
Th+ Ja+ 26 14:41:10 2017: AUTOLOAD C+a__e_::J++i-e_::SRX::chec+_+e__age_
Th+ Ja+ 26 14:41:10 2017: AUTOLOAD C+a__e_::J++i-e_::SRX::+agi-__e|i+
OK - +-de0 R-++i+g E+gi+e 0 c-+ +_age i_ 19.00%, +-de1 R-++i+g E+gi+e 0 c-+ +_age i_ 15.00%
chec+i+g --e_a+i+_
+-de0 R-++i+g E+gi+e 0 c-+ +_age i_ 19.00%
+-de1 R-++i+g E+gi+e 0 c-+ +_age i_ 15.00% | 'c-+_+-de0 R-++i+g E+gi+e 0_+_age'=19%;50;90;0;100 'c-+_+-de1 R-++i+g E+gi+e 0_+_age'=15%;50;90;0;100
X@QXJ:~$
time snmpwalk -v3 -u X-a SHA -l authPriv -A X-x AES -X X 10.X.33 1.3.6.1.4.1.2636.3.1.13
[...]
real 0m37.020s
user 0m0.176s
sys 0m0.040s
--mode uptime works fine
Same with hardware-health and , after ~40 secounds only crazy output, looks like he get all oids and fails in the handling of the results.
For example:
[...]
[OPERATING_1.2.0.0]
jnxContentsChassisCleiCode:
jnxContentsChassisDescr: node1
jnxContentsChassisId: node1
jnxContentsContainerIndex: 1
jnxContentsDescr: node1 midplane
jnxContentsInstalled: 0
jnxContentsL1Index: 2
jnxContentsL2Index: 0
jnxContentsL3Index: 0
jnxContentsPartNo: 711-XXXXXXXXXXX12
jnxContentsRevision: REV 03
jnxContentsSerialNo: S/N ACXXXXXXXXXXX99
jnxContentsType: 1.3.6.1.4.1.2636.1.1.3.1.49
jnxFilledChassisDescr: node1
jnxFilledChassisId: node1
jnxFilledContainerIndex: 1
jnxFilledDescr: node1, chassis frame
jnxFilledL1Index: 2
jnxFilledL2Index: 0
jnxFilledL3Index: 0
jnxFilledState: filled
jnxOperatingBuffer: 0
jnxOperatingCPU: 0
jnxOperatingChassisDescr: node1
jnxOperatingChassisId: node1
jnxOperatingContentsIndex: 1
jnxOperatingDRAMSize: 0
jnxOperatingDescr: node1 midplane
jnxOperatingHeap: 0
jnxOperatingISR: 0
jnxOperatingL1Index: 2
jnxOperatingL2Index: 0
jnxOperatingL3Index: 0
jnxOperatingLastRestart: 0
jnxOperatingMemory: 0
jnxOperatingRestartTime: #
++|O-e_a+i+gS+a+e: _+++i+g
++|O-e_a+i+gS+a+eO_de_ed: _+++i+g
++|O-e_a+i+gTe+-: 0
++|O-e_a+i+gU-Ti+e: 2083079000
[OPERATING_10.1.1.0]
++|C-++e++_Cha__i_C+eiC-de:
++|C-++e++_Cha__i_De_c_: +-de0
++|C-++e++_Cha__i_Id: +-de0
++|C-++e++_C-++ai+e_I+de|: 10
++|C-++e++_De_c_: +-de0 FPM B-a_d
++|C-++e++_I+_+a++ed: 0
++|C-++e++_L1I+de|: 1
++|C-++e++_L2I+de|: 1
++|C-++e++_L3I+de|: 0
++|C-++e++_Pa_+N-:
++|C-++e++_Re+i_i-+:
++|C-++e++_Se_ia+N-:
++|C-++e++_Ty-e: 1.3.6.1.4.1.2636.1.1.3.2.12.6
++|Fi++edCha__i_De_c_: +-de0
++|Fi++edCha__i_Id: +-de0
++|Fi++edC-++ai+e_I+de|: 10
++|Fi++edDe_c_: +-de0, FPM B-a_d _+-+
++|Fi++edL1I+de|: 1
++|Fi++edL2I+de|: 1
++|Fi++edL3I+de|: 0
++|Fi++edS+a+e: fi++ed
++|F_+Cha__i_De_c_: +-de0
++|F_+Cha__i_Id: +-de0
++|F_+C-++e++_I+de|: 10
++|F_+L1I+de|: 1
++|F_+L2I+de|: 1
++|F_+L3I+de|: 0
++|F_+La_+P-+e_Off: 0
++|F_+La_+P-+e_O+: 442
++|F_+Na+e: +-de0 FPM B-a_d
++|F_+Off+i+eRea_-+: +-+e
++|F_+P-+e_U-Ti+e: 2083680145
++|F_+P_dA__ig++e++: 0
++|F_+S+-+: 0
++|F_+S+a+e: -++i+e
++|F_+Te+-: 0
++|F_+Ty-e: f_-++Pa+e+M-d++e
++|O-e_a+i+gB+ffe_: 0
++|O-e_a+i+gCPU: 0
++|O-e_a+i+gCha__i_De_c_: +-de0
++|O-e_a+i+gCha__i_Id: +-de0
++|O-e_a+i+gC-++e++_I+de|: 10
++|O-e_a+i+gDRAMSize: 0
++|O-e_a+i+gDe_c_: +-de0 FPM B-a_d
++|O-e_a+i+gHea-: 0
++|O-e_a+i+gISR: 0
++|O-e_a+i+gL1I+de|: 1
++|O-e_a+i+gL2I+de|: 1
++|O-e_a+i+gL3I+de|: 0
++|O-e_a+i+gLa_+Re_+a_+: 0
++|O-e_a+i+gMe+-_y: 0
++|O-e_a+i+gRe_+a_+Ti+e: 8!
++|O-e_a+i+gS+a+e: _+++i+g
++|O-e_a+i+gS+a+eO_de_ed: _+++i+g
++|O-e_a+i+gTe+-: 0
++|O-e_a+i+gU-Ti+e: 2083677963
[...]
additional - this running in a SRX cluster
Hello, Juniper say, that the MIB 1.3.6.1.4.1.2636.3.1.13.1.19 (jnxOperatingRestartTime) has only hexa-decimal output. Is it possible that you use MIB 1.3.6.1.4.1.2636.3.1.13.1.13 (jnxOperatingUpTime)? This is the same in integer.
Hi there
we also have intermittent response of one of our EX switches, we monitor SRX and EX with the same plugin and same settings, just one EX3300 JUNOS 15.1R7.9 built 2018-09-11 switch stack has a timeout on hardware-health mode only
root@monitoringsrv ~ # /usr/lib64/nagios/plugins/check_nwc_health --community casvm --hostname 172.20.176.134 --mode hardware-health --timeout 420 -vvvvvvvvvv UNKNOWN - check_nwc_health timed out after 420 seconds root@monitoringsrv ~ # /usr/lib64/nagios/plugins/check_nwc_health --community casvm --hostname 172.20.176.134 --mode cpu-load --timeout 420 OK - Routing Engine 0 cpu usage is 60.00%, Routing Engine 1 cpu usage is 6.00% | 'cpu_Routing Engine 0_usage'=60%;85;95;0;100 'cpu_Routing Engine 1_usage'=6%;85;95;0;100
PS, the timeout did not happen with the version we used before that was from last year