rasdaemon icon indicating copy to clipboard operation
rasdaemon copied to clipboard

rasdaemon: add event level for event record

Open winterddd opened this issue 9 months ago • 1 comments

To help users distinguish more and more events, this patch introduces event levels to indicate the severity of the current event to the system. Currently, three main levels are used: Alert, Crit, Error. Fatal events will be marked as "emerg" but in reality, the kernel will panic upon receiving a fatal event, so rasdaemon does not receive it.

ALERT: The uncorrected hardware error has been fixed, but cause side effects. CRIT: The uncorrected hardware error has been detected. ERROR: The corrected hardware error has been detected.

The log is like follow

<...>-1367638 [026] d.H. 0.024825 mc_event [CRIT] 2025-03-28 13:28:48 +0800 1 Uncorrected error: multi-bit ECC on unknown memory (mc: 0 address: 0xe53218400 grain: 0 APEI location: node:0 card:2 module:0 rank:0 bank_group:4 bank_address:0 row:25906 column:64 chip_id:0 status(0x0000000000000400): Storage error in DRAM memory) <...>-1367638 [022] .... 0.024825 memory_failure_event [ALERT] 2025-03-28 13:28:48 +0800 pfn=0xe53218 page_type=already truncated LRU page action_result=Recovered

winterddd avatar Mar 28 '25 05:03 winterddd

LGTM, now. Thanks.

Reviewed-by: Shuai Xue [email protected]

axiqia avatar Apr 28 '25 08:04 axiqia

merged, thanks!

mchehab avatar Nov 14 '25 13:11 mchehab