operations
operations copied to clipboard
Spike-08 has potentially faulty PSUs
The 500W PSUs in spike-08.
Under extreme load spikes the PSUs fail.
Bay | Present | Status | PDS | Hotplug | Model | Spare | Serial Number | Capacity | Firmware |
---|---|---|---|---|---|---|---|---|---|
1 | OK | Good, In Use | No | Yes | 720478-B21 | 754377-001 | 5DLUT0C8J7R4JT | 500 Watts | 1.02 |
2 | OK | Good, In Use | No | Yes | 720478-B21 | 754377-001 | 5DLUT0C8J7Q4T6 | 500 Watts | 1.02 |
0139 Critical 18:50 07/19/2022 18:50 07/19/2022 0001
LOG: Server Critical Fault (Service Information: Runtime Fault, System Board, P12V Main/AUX Regulator 1 (04h))
New 865408-B21 are known good replacements.
Disabling High Efficiency Mode reduces the occurrence of the issue.
8J serial with Firmware 2.00 is known good, but a G10 I think is required to update the firmware. G9 is unable to update the firmware.
8J serial with Firmware 1.02 from experience is bad. A prime95 will cause the P12V Main/AUX Regulator
issue within a few minutes.
Related: https://github.com/openstreetmap/operations/issues/688
The closest HPE document on the issue: https://support.hpe.com/hpesc/public/docDisplay?docId=a00050474en_us Note that 8J serial with firmware 1.02 is still not good from experience. Firmware 2.00 is OK.
I have set PSUs to balanced mode, instead of High Efficiency mode which reduces the likelihood of a PSU 12V trip.
spike-08 only appears to have tripped 2 times due to the PSU 12V issue. So thankfully quite rare.
Replacement PSUs ordered.
Swapped out.