Failed memory in ysera
The hardware site identifies ysera as having 220GB RAM in the form
7 x 32GB 2666 MT/s DDR4 DIMM 1 x 2666 MT/s DDR4 DIMM 4 x Spare DIMM slot
It should have 256GB, identical to odin. SSHing also shows only 220GB with free. This is causing about a 10% reduction in performance, based on rerenders.
Looking at munin, it's been this way since at least July 2022.
Server was purchased in March 2019 by Sentral. Or at least odin was, and I think they were purchased at the same time. This puts it well out of warranty.
I have ordered replacement memory.
ysera has started crashing due to uncorrected ECC errors. I will try visit in the next week.
Uncorrectable ECC / other uncorrectable memory error @P2-DIMMA1(CPU2) - Assertion
Uncorrectable ECC / other uncorrectable memory error @P2-DIMMB1(CPU2) - Assertion
Waiting on restored access to UCL - Slough. Currently I do not have access.
Blocked by #1060
2 weeks ago I enabled Adaptive Double DRAM Device Correction (ADDDC) in the BIOS and down clocked the RAM speed. The machine has now been stable for 2 weeks which is an improvement.
Probably needs BIOS update too.
I have updated BIOS and BMC. BIOS requires reboot complete update which I will do shortly once on-site. I have also updated snap-02 and eddie in the same way.
All upgraded. Faulty RAM replaced and extra RAM installed.