Update component temperature thresholds
Change Scope
- Deprecate existing temperature threshold because it uses units (uint32) which are not compatible with temperature (decimal)
- Add high and low temperature thresholds using decimal64 celsius units
- This change is backwards compatible
Tree view
*** /Users/dloher/old-tree.txt Wed Mar 20 13:05:59 2024
--- /Users/dloher/newthres-tree.txt Wed Mar 20 13:00:27 2024
***************
*** 8588,8603 ****
| +--ro switchover-ready? boolean
| +--ro base-mac-address? oc-yang:mac-address
| +--ro temperature
| | +--ro instant? decimal64
| | +--ro avg? decimal64
| | +--ro min? decimal64
| | +--ro max? decimal64
| | +--ro interval? oc-types:stat-interval
| | +--ro min-time? oc-types:timeticks64
| | +--ro max-time? oc-types:timeticks64
| | +--ro alarm-status? boolean
! | | +--ro alarm-threshold? uint32
! | | +--ro alarm-severity? identityref
| +--ro memory
| | +--ro available? uint64
| | +--ro utilized? uint64
--- 8588,8607 ----
| +--ro switchover-ready? boolean
| +--ro base-mac-address? oc-yang:mac-address
| +--ro temperature
| | +--ro instant? decimal64
| | +--ro avg? decimal64
| | +--ro min? decimal64
| | +--ro max? decimal64
| | +--ro interval? oc-types:stat-interval
| | +--ro min-time? oc-types:timeticks64
| | +--ro max-time? oc-types:timeticks64
| | +--ro alarm-status? boolean
+ | | +--ro alarm-threshold-lower? decimal64
+ | | +--ro alarm-severity-lower? identityref
+ | | +--ro alarm-threshold-upper? decimal64
+ | | +--ro alarm-severity-upper? identityref
+ | | x--ro alarm-threshold? uint32
+ | | x--ro alarm-severity? identityref
| +--ro memory
| | +--ro available? uint64
| | +--ro utilized? uint64
Platform Implementations
- Reference Cisco IOS XR
=============================================================================================================
Location TEMPERATURE Value Crit Major Minor Minor Major Crit
Sensor (deg C) (Lo) (Lo) (Lo) (Hi) (Hi) (Hi)
-------------------------------------------------------------------------------------------------------------
0/RP0/CPU0
MB_JM01_L1_TEMP 37 -10 -5 0 130 135 140
MB_JM01_L2_TEMP 36 -10 -5 0 130 135 140
MB_JM11_L1_TEMP 33 -10 -5 0 130 135 140
MB_JM21_L1_TEMP 36 -10 -5 0 130 135 140
- Reference JunOS - temperature-thresholds
user@host> show chassis temperature-thresholds
Fan speed Yellow alarm Red alarm Fire Shutdown
(degrees C) (degrees C) (degrees C) (degrees C)
Item Normal High Normal Bad fan Normal Bad fan Normal
Routing Engine 0 48 54 85 85 100 100 102
Routing Engine 1 48 54 85 85 100 100 102
CB 0 Intake Temp Sensor 30 35 80 80 85 85 95
CB 0 Exhaust Temp Sensor 30 35 80 80 85 85 95
CB 0 CPU Die Temp Sensor 40 45 95 95 100 100 110
CB 1 Intake Temp Sensor 30 35 80 80 85 85 95
CB 1 Exhaust Temp Sensor 30 35 80 80 85 85 95
CB 1 CPU Die Temp Sensor 40 45 95 95 100 100 110
FPC 0 Intake-A Temp Sensor 30 35 80 80 85 85 95
No major YANG version changes in commit 4e2fd5bd226652d9c9e10dc49d563ba3ac31b92f
Reviewed in Mar 12, 2024 OC operators meeting. @s19nal noted that the examples show 3 thresholds for high and low and we should allow all these instead of single high and low thresholds, otherwise the vendor has to choose which one to use. I will update this PR to reflect all 6 thresholds
Added severity for upper and lower. Added tree diff view to the PR description.
Reviewed in Mar 12, 2024 OC operators meeting. @s19nal noted that the examples show 3 thresholds for high and low and we should allow all these instead of single high and low thresholds, otherwise the vendor has to choose which one to use. I will update this PR to reflect all 6 thresholds
How does the newest version of this PR allow 3 thresholds to be expressed? It looks to me like it can express 2 thresholds, with corresponding severities.
Another thing I find confusing: The description of alarm-severity-lower is "The severity of the current low temperature alarm" -- what is the "current low temperature alarm", in contrast with "the current high temperature alarm" from the description of alarm-severity-upper? We only have one leaf alarm-status, so I would expect that there is only a single current alarm.
To me, the model used by the transceivers makes more sense: There is a list of thresholds, keyed by severity, and each entry in the list can have a lower and/or upper temperature threshold.
Looking at this again, I have another concern, in addition to the one that I expressed on April 1: I think the meaning of alarm-threshold-lower is not clear.
On a device that only raises alarms for rising temperatures, I could imagine that alarm-threshold-lower could be interpreted as the threshold for a lower-severity alarm, like "Yellow alarm" in the JunOS output above.
But it seems that some devices also have thresholds for temperatures going too low -- that's what the IOS XR output looks like to me: alarm-threshold-lower could mean one of "Minor (Lo)", "Major (Lo)", or "Crit (Lo)" [not sure which one].
I think the description for alarm-threshold-lower needs to identify one of these interpretations clearly (and likewise alarm-severity-lower).
(But I don't really think that's sufficient -- we still need further changes to address the requirement for more levels of severity.)