monitoring
monitoring copied to clipboard
check_cisco_nexus_hardware.pl: workaround for buggy transceivers
I have encountered one "buggy" transceiver that sets the following limits for Rx Power:
- warning: -30 dBm (-30000)
- alarm: 0 dBm (0)
As seen on the switch:
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 29.24 C 100.00 C -10.00 C 95.00 C -5.00 C
Voltage 3.29 V 3.63 V 2.90 V 3.50 V 3.00 V
Current 19.68 mA 100.00 mA 0.00 mA 80.00 mA 1.00 mA
Tx Power -5.63 dBm 6.99 dBm -15.08 dBm 4.99 dBm -13.01 dBm
Rx Power -10.91 dBm 1.99 dBm 0.00 dBm 0.00 dBm -30.00 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Obviously, the limits are totally wrong. The sensible limits would be -12 and -15 or something similar.
Maybe check_cisco_nexus_hardware.pl could workaround such buggy transceivers something like:
- Maybe only for value type dBm?
- Test if the thresh_value for warning is lower than thresh_value for alarm
- If true, then set the alarm thresh_value to same as warning thresh_value
Debug output showing the transceiver:
threshold data: thresh_value=1997 tresh_relation=3 thresh_severity=20 sensor_value=-10915
compare -10915 to 1997 and will return 20 if operator 3 is met
greaterthan compare
comparison result: 1
threshold data: thresh_value=-30000 tresh_relation=1 thresh_severity=10 sensor_value=-10915
compare -10915 to -30000 and will return 10 if operator 1 is met
lessthan compare
comparison result: 1
threshold data: thresh_value=0 tresh_relation=1 thresh_severity=20 sensor_value=-10915
compare -10915 to 0 and will return 20 if operator 1 is met
lessthan compare
comparison result: 20
threshold data: thresh_value=0 tresh_relation=3 thresh_severity=10 sensor_value=-10915
compare -10915 to 0 and will return 10 if operator 3 is met
greaterthan compare
comparison result: 1
sensor_alarm = 20 (nagios_rc=2)
add new sensor status for sensor_id = 300014093 (Ethernet1/9 Lane 1 Transceiver Receive Power Sensor->Transceiver(slot:1-port:9)->Linecard-1 Port-9->x + x Supervisor->LinecardSlot-1->nexus->Fabric Stack Root) rc=OK. type is =dBm
Hello,
As mentioned on other issues, I am unable to test any changes to the plugin. However, I would accept PR.
David.