rasdaemon
rasdaemon copied to clipboard
rasdaemon: erst: decode panic mce through erst
ERST records the MCE information that caused the kernel panic, helping us determine the cause of the last crash. Using rasdaemon to check and parse the ERST records at startup. Decoded info like follow:
<...>-0 [-01] .... 0.000000 mce_erst_record: 2025-03-26 14:52:42 +0800 bank=1, status= bd80000000100134, Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error, mci=Uncorrected_error Error_enabled SRAR Uncorrected_error Error_enabled SRAR Uncorrected_error Error_enabled SRAR, mca=Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error K, cpu_type= Sapphirerapids server, cpu= 159, socketid= 1, ip= ffffffff914a6476, cs= 10, misc= 86, addr= 8158f58400, mcgstatus=15 RIPV EIPV MCIP LMCE mcgstatus=15 RIPV EIPV MCIP LMCE mcgstatus=15 RIPV EIPV MCIP LMCE, mcgcap= f000c15, apicid= 9f, ppin= fc6b80e0ba9d616, microcode= 2b000571
Now environment ERST_DELETE is introduced, rasdaemon will delete origin erst file if ERST_DELETE set.
Merged, thanks!