public icon indicating copy to clipboard operation
public copied to clipboard

Update component temperature thresholds

Open dplore opened this issue 2 years ago • 5 comments

Change Scope

  • Deprecate existing temperature threshold because it uses units (uint32) which are not compatible with temperature (decimal)
  • Add high and low temperature thresholds using decimal64 celsius units
  • This change is backwards compatible

Tree view

*** /Users/dloher/old-tree.txt  Wed Mar 20 13:05:59 2024
--- /Users/dloher/newthres-tree.txt     Wed Mar 20 13:00:27 2024
***************
*** 8588,8603 ****
          |  +--ro switchover-ready?               boolean
          |  +--ro base-mac-address?               oc-yang:mac-address
          |  +--ro temperature
          |  |  +--ro instant?           decimal64
          |  |  +--ro avg?               decimal64
          |  |  +--ro min?               decimal64
          |  |  +--ro max?               decimal64
          |  |  +--ro interval?          oc-types:stat-interval
          |  |  +--ro min-time?          oc-types:timeticks64
          |  |  +--ro max-time?          oc-types:timeticks64
          |  |  +--ro alarm-status?      boolean
!         |  |  +--ro alarm-threshold?   uint32
!         |  |  +--ro alarm-severity?    identityref
          |  +--ro memory
          |  |  +--ro available?   uint64
          |  |  +--ro utilized?    uint64
--- 8588,8607 ----
          |  +--ro switchover-ready?               boolean
          |  +--ro base-mac-address?               oc-yang:mac-address
          |  +--ro temperature
          |  |  +--ro instant?                 decimal64
          |  |  +--ro avg?                     decimal64
          |  |  +--ro min?                     decimal64
          |  |  +--ro max?                     decimal64
          |  |  +--ro interval?                oc-types:stat-interval
          |  |  +--ro min-time?                oc-types:timeticks64
          |  |  +--ro max-time?                oc-types:timeticks64
          |  |  +--ro alarm-status?            boolean
+         |  |  +--ro alarm-threshold-lower?   decimal64
+         |  |  +--ro alarm-severity-lower?    identityref
+         |  |  +--ro alarm-threshold-upper?   decimal64
+         |  |  +--ro alarm-severity-upper?    identityref
+         |  |  x--ro alarm-threshold?         uint32
+         |  |  x--ro alarm-severity?          identityref
          |  +--ro memory
          |  |  +--ro available?   uint64
          |  |  +--ro utilized?    uint64

Platform Implementations

  • Reference Cisco IOS XR
=============================================================================================================
Location  TEMPERATURE                          Value     Crit    Major    Minor    Minor    Major    Crit
         Sensor                             (deg C)     (Lo)     (Lo)     (Lo)     (Hi)     (Hi)    (Hi)
-------------------------------------------------------------------------------------------------------------
0/RP0/CPU0 
         MB_JM01_L1_TEMP                        37      -10       -5        0      130      135      140
         MB_JM01_L2_TEMP                        36      -10       -5        0      130      135      140
         MB_JM11_L1_TEMP                        33      -10       -5        0      130      135      140
         MB_JM21_L1_TEMP                        36      -10       -5        0      130      135      140 
user@host> show chassis temperature-thresholds
                           Fan speed      Yellow alarm      Red alarm      Fire Shutdown
                          (degrees C)      (degrees C)     (degrees C)      (degrees C)
Item                     Normal  High   Normal  Bad fan   Normal  Bad fan     Normal
Routing Engine 0             48    54       85       85      100      100      102
Routing Engine 1             48    54       85       85      100      100      102
CB 0 Intake Temp Sensor      30    35       80       80       85       85       95
CB 0 Exhaust Temp Sensor     30    35       80       80       85       85       95
CB 0 CPU Die Temp Sensor     40    45       95       95      100      100      110
CB 1 Intake Temp Sensor      30    35       80       80       85       85       95
CB 1 Exhaust Temp Sensor     30    35       80       80       85       85       95
CB 1 CPU Die Temp Sensor     40    45       95       95      100      100      110
FPC 0 Intake-A Temp Sensor   30    35       80       80       85       85       95

dplore avatar Mar 05 '24 20:03 dplore

No major YANG version changes in commit 4e2fd5bd226652d9c9e10dc49d563ba3ac31b92f

OpenConfigBot avatar Mar 05 '24 20:03 OpenConfigBot

Reviewed in Mar 12, 2024 OC operators meeting. @s19nal noted that the examples show 3 thresholds for high and low and we should allow all these instead of single high and low thresholds, otherwise the vendor has to choose which one to use. I will update this PR to reflect all 6 thresholds

dplore avatar Mar 12 '24 16:03 dplore

Added severity for upper and lower. Added tree diff view to the PR description.

dplore avatar Mar 20 '24 20:03 dplore

Reviewed in Mar 12, 2024 OC operators meeting. @s19nal noted that the examples show 3 thresholds for high and low and we should allow all these instead of single high and low thresholds, otherwise the vendor has to choose which one to use. I will update this PR to reflect all 6 thresholds

How does the newest version of this PR allow 3 thresholds to be expressed? It looks to me like it can express 2 thresholds, with corresponding severities.

Another thing I find confusing: The description of alarm-severity-lower is "The severity of the current low temperature alarm" -- what is the "current low temperature alarm", in contrast with "the current high temperature alarm" from the description of alarm-severity-upper? We only have one leaf alarm-status, so I would expect that there is only a single current alarm.

To me, the model used by the transceivers makes more sense: There is a list of thresholds, keyed by severity, and each entry in the list can have a lower and/or upper temperature threshold.

nkitchen avatar Apr 01 '24 23:04 nkitchen

Looking at this again, I have another concern, in addition to the one that I expressed on April 1: I think the meaning of alarm-threshold-lower is not clear.

On a device that only raises alarms for rising temperatures, I could imagine that alarm-threshold-lower could be interpreted as the threshold for a lower-severity alarm, like "Yellow alarm" in the JunOS output above.

But it seems that some devices also have thresholds for temperatures going too low -- that's what the IOS XR output looks like to me: alarm-threshold-lower could mean one of "Minor (Lo)", "Major (Lo)", or "Crit (Lo)" [not sure which one].

I think the description for alarm-threshold-lower needs to identify one of these interpretations clearly (and likewise alarm-severity-lower).

(But I don't really think that's sufficient -- we still need further changes to address the requirement for more levels of severity.)

nkitchen-arista avatar Oct 03 '24 16:10 nkitchen-arista