rasdaemon icon indicating copy to clipboard operation
rasdaemon copied to clipboard

rasdaemon: erst: decode panic mce through erst

Open winterddd opened this issue 9 months ago • 1 comments

ERST records the MCE information that caused the kernel panic, helping us determine the cause of the last crash. Using rasdaemon to check and parse the ERST records at startup. Decoded info like follow:

       <...>-0          [-01] .... 0.000000           mce_erst_record: 2025-03-26 14:52:42 +0800 bank=1, status= bd80000000100134, Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error, mci=Uncorrected_error Error_enabled SRAR Uncorrected_error Error_enabled SRAR Uncorrected_error Error_enabled SRAR, mca=Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error Data CACHE Level-0 Data-Read Error K, cpu_type= Sapphirerapids server, cpu= 159, socketid= 1, ip= ffffffff914a6476, cs= 10, misc= 86, addr= 8158f58400, mcgstatus=15 RIPV EIPV MCIP LMCE mcgstatus=15 RIPV EIPV MCIP LMCE mcgstatus=15 RIPV EIPV MCIP LMCE, mcgcap= f000c15, apicid= 9f, ppin= fc6b80e0ba9d616, microcode= 2b000571

Now environment ERST_DELETE is introduced, rasdaemon will delete origin erst file if ERST_DELETE set.

winterddd avatar Mar 26 '25 09:03 winterddd

LGTM, now. Thanks.

Reviewed-by: Shuai Xue [email protected]

axiqia avatar Apr 28 '25 08:04 axiqia

Merged, thanks!

mchehab avatar Nov 14 '25 12:11 mchehab