pcm icon indicating copy to clipboard operation
pcm copied to clipboard

Consider adding pcm-memory option to display ECC Correctable Errors

Open Chester-Gillon opened this issue 3 years ago • 1 comments

Having one Intel® Xeon® Processor E5 v3 Family server class system in which mcelog was reporting ECC Correctable Errors, in https://github.com/Chester-Gillon/pcm hacked a quick change which added an option to report the ECC Correctable Errors counter. On the problematic server this showed one memory channel was having continuous ECC Correctable Errors, where the error rate varied according to the workload.

It might be worthwhile adding this as a pull-request for a permanent feature. However, not sure how to test this on all server class processors which support ECC Correctable Errors in the ServerUncoreMemoryMetrics

Chester-Gillon avatar Jun 26 '21 20:06 Chester-Gillon

thanks for sharing your change. I am not sure if this is a right path as I don't see this event in the documentation (of more recent processors). May be using pcm-raw is a better way?

pcm-raw -e imc/config=0x09,name=ECC_CORRECTABLE_ERRORS/

pcm-raw documentation: https://github.com/opcm/pcm/blob/master/PCM_RAW_README.md

opcm avatar Jun 27 '21 17:06 opcm