monitoring icon indicating copy to clipboard operation
monitoring copied to clipboard

check_cisco_nexus_hardware.pl: workaround for buggy transceivers

Open bluikko opened this issue 6 years ago • 1 comments

I have encountered one "buggy" transceiver that sets the following limits for Rx Power:

  • warning: -30 dBm (-30000)
  • alarm: 0 dBm (0)

As seen on the switch:

           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   29.24 C       100.00 C    -10.00 C     95.00 C       -5.00 C
  Voltage        3.29 V         3.63 V      2.90 V      3.50 V        3.00 V
  Current       19.68 mA      100.00 mA     0.00 mA    80.00 mA       1.00 mA
  Tx Power      -5.63 dBm       6.99 dBm  -15.08 dBm    4.99 dBm    -13.01 dBm
  Rx Power     -10.91 dBm       1.99 dBm    0.00 dBm    0.00 dBm    -30.00 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------

Obviously, the limits are totally wrong. The sensible limits would be -12 and -15 or something similar.

Maybe check_cisco_nexus_hardware.pl could workaround such buggy transceivers something like:

  • Maybe only for value type dBm?
  • Test if the thresh_value for warning is lower than thresh_value for alarm
  • If true, then set the alarm thresh_value to same as warning thresh_value

Debug output showing the transceiver:

threshold data: thresh_value=1997 tresh_relation=3 thresh_severity=20 sensor_value=-10915
compare -10915 to 1997 and will return 20 if operator 3 is met
greaterthan compare
comparison result: 1
threshold data: thresh_value=-30000 tresh_relation=1 thresh_severity=10 sensor_value=-10915
compare -10915 to -30000 and will return 10 if operator 1 is met
lessthan compare
comparison result: 1
threshold data: thresh_value=0 tresh_relation=1 thresh_severity=20 sensor_value=-10915
compare -10915 to 0 and will return 20 if operator 1 is met
lessthan compare
comparison result: 20
threshold data: thresh_value=0 tresh_relation=3 thresh_severity=10 sensor_value=-10915
compare -10915 to 0 and will return 10 if operator 3 is met
greaterthan compare
comparison result: 1
sensor_alarm = 20 (nagios_rc=2)
add new sensor status for sensor_id = 300014093 (Ethernet1/9 Lane 1 Transceiver Receive Power Sensor->Transceiver(slot:1-port:9)->Linecard-1 Port-9->x + x Supervisor->LinecardSlot-1->nexus->Fabric Stack Root) rc=OK. type is =dBm

bluikko avatar Jun 12 '18 10:06 bluikko

Hello,

As mentioned on other issues, I am unable to test any changes to the plugin. However, I would accept PR.

David.

david-barbion avatar Aug 08 '18 20:08 david-barbion