check_nwc_health icon indicating copy to clipboard operation
check_nwc_health copied to clipboard

some checks fails @ SRX1400

Open ClemensBW opened this issue 8 years ago • 4 comments

Hello,

we use check_nwc_health $Revision: 5.11.3

--mode cpu-load fails with a strage error (please note timeout 120, if we use 60 we get only a timeout)

X@XJ:~$ /usr/lib/nagios/plugins/check_nwc_health -vvvvvvvvvvv --mode cpu-load --hostname 10.X.33  --protocol 3 --username X --authpassword X --authprotocol SHA1 --privpassword X --privprotocol AES --timeout 120   
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Device::check_messages

Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.3.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysUpTime (1.3.6.1.2.1.1.3.0) : 2083587759
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SNMPFRAMEWORKMIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.6.3.10.2.1.3.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SNMPFRAMEWORKMIB
Thu Jan 26 14:40:01 2017: GET: SNMP-FRAMEWORK-MIB::snmpEngineTime (1.3.6.1.6.3.10.2.1.3.0) : 20835877
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.1.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysDescr (1.3.6.1.2.1.1.1.0) : Juniper Networks, Inc. srx1400 internet router, kernel JUNOS 12.1X46-D40.2 #0: 2015-09-26 03:22:03 UTC     [email protected]:/volume/build/junos/12.1/service/12.1X46-D40.2/obj-powerpc/junos/bsd/kernels/JUNIPER-SRX/kernel Build date: 2015-09-26 
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.2.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2.0) : 1.3.6.1.4.1.2636.1.1.1.2.49
Thu Jan 26 14:40:01 2017: uptime: 20835877
Thu Jan 26 14:40:01 2017: up since: Mon May 30 11:55:24 2016
Thu Jan 26 14:40:01 2017: whoami: Juniper Networks, Inc. srx1400 internet router, kernel JUNOS 12.1X46-D40.2 #0: 2015-09-26 03:22:03 UTC     [email protected]:/volume/build/junos/12.1/service/12.1X46-D40.2/obj-powerpc/junos/bsd/kernels/JUNIPER-SRX/kernel Build date: 2015-09-26 
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Device::check_messages

I am a Juniper Networks, Inc. srx1400 internet router, kernel JUNOS 12.1X46-D40.2 #0: 2015-09-26 03:22:03 UTC     [email protected]:/volume/build/junos/12.1/service/12.1X46-D40.2/obj-powerpc/junos/bsd/kernels/JUNIPER-SRX/kernel Build date: 2015-09-26 
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::SYNOPTICSROOTMIB
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.2.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2.0) : 1.3.6.1.4.1.2636.1.1.1.2.49
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.2.1.1.2.0
Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::MIB2MIB
Thu Jan 26 14:40:01 2017: GET: MIB-2-MIB::sysObjectID (1.3.6.1.2.1.1.2.0) : 1.3.6.1.4.1.2636.1.1.1.2.49
Thu Jan 26 14:40:01 2017: implements JUNIPER-MIB (found traces)
Thu Jan 26 14:40:01 2017: using Classes::Juniper::SRX
Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Juniper::SRX::override_opt

Thu Jan 26 14:40:01 2017: AUTOLOAD Monitoring::GLPlugin::Commandline::override_opt

Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Juniper::SRX::check_messages

Thu Jan 26 14:40:01 2017: AUTOLOAD Classes::Juniper::SRX::analyze_and_check_cpu_subsystem

Thu Jan 26 14:40:01 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB
Thu Jan 26 14:40:01 2017: get_snmp_table_objects JUNIPER-MIB jnxOperatingTable
Thu Jan 26 14:40:01 2017: get_snmp_table_objects calls get_table 1.3.6.1.4.1.2636.3.1.13
Thu Jan 26 14:40:01 2017: cache: 1.3.6.1.4.1.2636.3.1.13
Thu Jan 26 14:40:01 2017: get_table $VAR1 = {
          '-baseoid' => '1.3.6.1.4.1.2636.3.1.13'
        };

Thu Jan 26 14:40:17 2017: get_table returned 0 oids
Thu Jan 26 14:40:17 2017: get_table error: No response from remote host "10.X.33"
Thu Jan 26 14:40:17 2017: get_table error: try fallback
Thu Jan 26 14:40:17 2017: get_table $VAR1 = {
          '-maxrepetitions' => 1,
          '-baseoid' => '1.3.6.1.4.1.2636.3.1.13'
        };

Thu Jan 26 14:41:10 2017: get_table returned 720 oids
Thu Jan 26 14:41:10 2017: get_matching_oids $VAR1 = {
          '-columns' => [
                          '1.3.6.1.4.1.2636.3.1.13'
                        ]
        };

Thu Jan 26 14:41:10 2017: get_matching_oids returns 720 from 724 oids
Thu Jan 26 14:41:10 2017: get_snmp_table_objects get_table returns 720 oids
Thu Jan 26 14:41:10 2017: get_snmp_table_objects get_table returns 30 indices
Thu Jan 26 14:41:10 2017: i know package Monitoring::GLPlugin::SNMP::MibsAndOids::JUNIPERMIB
Thu Jan 26 14:41:10 2017: $self->{components}->{cpu_subsystem} = Classes::Juniper::SRX::Component::CpuSubsystem->new()
Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::check_cpu_subsystem

Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::set_thresholds

Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::check_thresholds

Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::add_perfdata

Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::set_thresholds

Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::check_thresholds

Thu Jan 26 14:41:10 2017: AUTOLOAD Classes::Juniper::SRX::Component::CpuSubsystem::OperatingItem::add_perfdata

[CPUSUBSYSTEM]
info: checking operatins

[OPERATINGITEM_9.1.0.0]
jnxOperatingBuffer: 45
jnxOperatingCPU: 19
jnxOperatingChassisDescr: node0
jnxOperatingChassisId: node0
jnxOperatingContentsIndex: 9
jnxOperatingDRAMSize: 1072693248
jnxOperatingDescr: node0 Routing Engine 0
jnxOperatingHeap: 0
jnxOperatingISR: 0
jnxOperatingL1Index: 1
jnxOperatingL2Index: 0
jnxOperatingL3Index: 0
jnxOperatingLastRestart: 0
jnxOperatingMemory: 1023
jnxOperatingRestartTime: 5,+
jnxOperatingState: running
jnxOperatingStateOrdered: running
jnxOperatingTemp: 0
jnxOperatingUpTime: 2083608300
info: node0 Routing Engine 0 cpu usage is 19.00%

[OPERATINGITEM_9.3.0.0]
jnxOperatingBuffer: 42
jnxOperatingCPU: 15
jnxOperatingChassisDescr: node1
jnxOperatingChassisId: node1
jnxOperatingContentsIndex: 9
jnxOperatingDRAMSize: 1072693248
jnxOperatingDescr: node1 Routing Engine 0
jnxOperatingHeap: 0
jnxOperatingISR: 0
jnxOperatingL1Index: 3
jnxOperatingL2Index: 0
jnxOperatingL3Index: 0
jnxOperatingLastRestart: 0
jnxOperatingMemory: 1023
jnxOperatingRestartTime: #+
++|O-e_a+i+gS+a+e: _+++i+g
++|O-e_a+i+gS+a+eO_de_ed: _+++i+g
++|O-e_a+i+gTe+-: 0
++|O-e_a+i+gU-Ti+e: 2082993000
i+f-: +-de1 R-++i+g E+gi+e 0 c-+ +_age i_ 15.00%

Th+ Ja+ 26 14:41:10 2017: AUTOLOAD C+a__e_::J++i-e_::SRX::chec+_+e__age_

Th+ Ja+ 26 14:41:10 2017: AUTOLOAD C+a__e_::J++i-e_::SRX::chec+_+e__age_

Th+ Ja+ 26 14:41:10 2017: AUTOLOAD C+a__e_::J++i-e_::SRX::+agi-__e|i+

OK - +-de0 R-++i+g E+gi+e 0 c-+ +_age i_ 19.00%, +-de1 R-++i+g E+gi+e 0 c-+ +_age i_ 15.00%
chec+i+g --e_a+i+_
+-de0 R-++i+g E+gi+e 0 c-+ +_age i_ 19.00%
+-de1 R-++i+g E+gi+e 0 c-+ +_age i_ 15.00% | 'c-+_+-de0 R-++i+g E+gi+e 0_+_age'=19%;50;90;0;100 'c-+_+-de1 R-++i+g E+gi+e 0_+_age'=15%;50;90;0;100
X@QXJ:~$ 
time snmpwalk -v3 -u X-a SHA -l authPriv -A X-x AES -X X 10.X.33 1.3.6.1.4.1.2636.3.1.13
[...]
real    0m37.020s
user    0m0.176s
sys     0m0.040s

--mode uptime works fine

Same with hardware-health and , after ~40 secounds only crazy output, looks like he get all oids and fails in the handling of the results.

For example:

[...]
[OPERATING_1.2.0.0]
jnxContentsChassisCleiCode: 
jnxContentsChassisDescr: node1
jnxContentsChassisId: node1
jnxContentsContainerIndex: 1
jnxContentsDescr: node1 midplane
jnxContentsInstalled: 0
jnxContentsL1Index: 2
jnxContentsL2Index: 0
jnxContentsL3Index: 0
jnxContentsPartNo: 711-XXXXXXXXXXX12
jnxContentsRevision: REV 03
jnxContentsSerialNo: S/N ACXXXXXXXXXXX99
jnxContentsType: 1.3.6.1.4.1.2636.1.1.3.1.49
jnxFilledChassisDescr: node1
jnxFilledChassisId: node1
jnxFilledContainerIndex: 1
jnxFilledDescr: node1, chassis frame
jnxFilledL1Index: 2
jnxFilledL2Index: 0
jnxFilledL3Index: 0
jnxFilledState: filled
jnxOperatingBuffer: 0
jnxOperatingCPU: 0
jnxOperatingChassisDescr: node1
jnxOperatingChassisId: node1
jnxOperatingContentsIndex: 1
jnxOperatingDRAMSize: 0
jnxOperatingDescr: node1 midplane
jnxOperatingHeap: 0
jnxOperatingISR: 0
jnxOperatingL1Index: 2
jnxOperatingL2Index: 0
jnxOperatingL3Index: 0
jnxOperatingLastRestart: 0
jnxOperatingMemory: 0
jnxOperatingRestartTime: #
++|O-e_a+i+gS+a+e: _+++i+g
++|O-e_a+i+gS+a+eO_de_ed: _+++i+g
++|O-e_a+i+gTe+-: 0
++|O-e_a+i+gU-Ti+e: 2083079000

[OPERATING_10.1.1.0]
++|C-++e++_Cha__i_C+eiC-de: 
++|C-++e++_Cha__i_De_c_: +-de0
++|C-++e++_Cha__i_Id: +-de0
++|C-++e++_C-++ai+e_I+de|: 10
++|C-++e++_De_c_: +-de0 FPM B-a_d
++|C-++e++_I+_+a++ed: 0
++|C-++e++_L1I+de|: 1
++|C-++e++_L2I+de|: 1
++|C-++e++_L3I+de|: 0
++|C-++e++_Pa_+N-: 
++|C-++e++_Re+i_i-+: 
++|C-++e++_Se_ia+N-: 
++|C-++e++_Ty-e: 1.3.6.1.4.1.2636.1.1.3.2.12.6
++|Fi++edCha__i_De_c_: +-de0
++|Fi++edCha__i_Id: +-de0
++|Fi++edC-++ai+e_I+de|: 10
++|Fi++edDe_c_: +-de0, FPM B-a_d _+-+
++|Fi++edL1I+de|: 1
++|Fi++edL2I+de|: 1
++|Fi++edL3I+de|: 0
++|Fi++edS+a+e: fi++ed
++|F_+Cha__i_De_c_: +-de0
++|F_+Cha__i_Id: +-de0
++|F_+C-++e++_I+de|: 10
++|F_+L1I+de|: 1
++|F_+L2I+de|: 1
++|F_+L3I+de|: 0
++|F_+La_+P-+e_Off: 0
++|F_+La_+P-+e_O+: 442
++|F_+Na+e: +-de0 FPM B-a_d
++|F_+Off+i+eRea_-+: +-+e
++|F_+P-+e_U-Ti+e: 2083680145
++|F_+P_dA__ig++e++: 0
++|F_+S+-+: 0
++|F_+S+a+e: -++i+e
++|F_+Te+-: 0
++|F_+Ty-e: f_-++Pa+e+M-d++e
++|O-e_a+i+gB+ffe_: 0
++|O-e_a+i+gCPU: 0
++|O-e_a+i+gCha__i_De_c_: +-de0
++|O-e_a+i+gCha__i_Id: +-de0
++|O-e_a+i+gC-++e++_I+de|: 10
++|O-e_a+i+gDRAMSize: 0
++|O-e_a+i+gDe_c_: +-de0 FPM B-a_d
++|O-e_a+i+gHea-: 0
++|O-e_a+i+gISR: 0
++|O-e_a+i+gL1I+de|: 1
++|O-e_a+i+gL2I+de|: 1
++|O-e_a+i+gL3I+de|: 0
++|O-e_a+i+gLa_+Re_+a_+: 0
++|O-e_a+i+gMe+-_y: 0
++|O-e_a+i+gRe_+a_+Ti+e: 8!
++|O-e_a+i+gS+a+e: _+++i+g
++|O-e_a+i+gS+a+eO_de_ed: _+++i+g
++|O-e_a+i+gTe+-: 0
++|O-e_a+i+gU-Ti+e: 2083677963
[...]

ClemensBW avatar Jan 26 '17 14:01 ClemensBW

additional - this running in a SRX cluster

ClemensBW avatar Jan 26 '17 14:01 ClemensBW

Hello, Juniper say, that the MIB 1.3.6.1.4.1.2636.3.1.13.1.19 (jnxOperatingRestartTime) has only hexa-decimal output. Is it possible that you use MIB 1.3.6.1.4.1.2636.3.1.13.1.13 (jnxOperatingUpTime)? This is the same in integer.

ClemensBW avatar Mar 16 '17 11:03 ClemensBW

Hi there

we also have intermittent response of one of our EX switches, we monitor SRX and EX with the same plugin and same settings, just one EX3300 JUNOS 15.1R7.9 built 2018-09-11 switch stack has a timeout on hardware-health mode only

root@monitoringsrv ~ # /usr/lib64/nagios/plugins/check_nwc_health --community casvm --hostname 172.20.176.134 --mode hardware-health --timeout 420 -vvvvvvvvvv UNKNOWN - check_nwc_health timed out after 420 seconds root@monitoringsrv ~ # /usr/lib64/nagios/plugins/check_nwc_health --community casvm --hostname 172.20.176.134 --mode cpu-load --timeout 420 OK - Routing Engine 0 cpu usage is 60.00%, Routing Engine 1 cpu usage is 6.00% | 'cpu_Routing Engine 0_usage'=60%;85;95;0;100 'cpu_Routing Engine 1_usage'=6%;85;95;0;100

ant0nwax avatar Aug 12 '19 14:08 ant0nwax

PS, the timeout did not happen with the version we used before that was from last year

ant0nwax avatar Aug 12 '19 14:08 ant0nwax