/mce/tests/page test failure
hi, I test mcelog , but got the error, page-soft-then-hard.conf: triggers did not trigger as expected: 4 != 6 My mcelog version is : mcelog mcelog-144-10.94d853b2ea81.el7 I see it inject 6 error,and page-soft-then-hard.conf has memory-ce-threshold = 1 / 1h,so i think mcelog should run triggers 6 times. Why page-soft-then-hard.conf has trigger: 4 ?
@andikleen hope for your reply , Thanks very much!
On Mon, May 17, 2021 at 07:19:27PM -0700, lumanyu180 wrote:
hi, I test mcelog , but got the error, page-soft-then-hard.conf: triggers did not trigger as expected: 4 != 6 My mcelog version is : mcelog mcelog-144-10.94d853b2ea81.el7 I see it inject 6 error,and page-soft-then-hard.conf has memory-ce-threshold = 1 / 1h,so i think mcelog should run triggers 6 times. Why page-soft-then-hard.conf has trigger: 4 ?
@andikleen hope for your reply , Thanks very much!
Do you mean tests/page/page-soft-then-hard.conf ?
The trigger comment is commended out with '#', the only thing that matters is the number in the memory-ce-threshold, which is 1
It injects more to make sure there is only one trigger in the hour.
-Andi
hi ,
Sorry, i don't understand your meaning. I just want to say: the test suit failed. I don't kown why it failed. And other test suit for example cache is succefull. when run cache test suit , the result is : cache.conf: triggers trigger as expected ! . But i run page test suit, the result is :page-soft-then-hard.conf: triggers did not trigger as expected: 4 != 6. Why ?
hope for your replay. Thanks very much ! @andikleen
I don't know. It could be either the kernel or mcelog. Do you use a standard kernel? What does the system log say?
hi:
I use a standard kernel with 4.9.29. This is the system log:
[root@H3C page]# cat page-soft-then-hard.log
mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
Running trigger ../trigger' Hardware event. This is not a software error. MCE 0 CPU 0 BANK 2 MISC 0 ADDR 1e9f000 TIME 946951762 Tue Jan 4 02:09:22 2000 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error STATUS 8c000000000000b0 MCGSTATUS 0 MCGCAP 1000c16 APICID 0 SOCKETID 0 PPIN 8c000140000000b1 CPUID Vendor Intel Family 6 Model 86 Running trigger ../trigger'
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 2
MISC 50000 ADDR 1e9f000
TIME 946951762 Tue Jan 4 02:09:22 2000
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER AC_CHANNEL1_ERR
Transaction: Address/Command error
STATUS 8c000140000000b1 MCGSTATUS 0
MCGCAP 1000c16 APICID 0 SOCKETID 0
PPIN 8c000000000000b0
CPUID Vendor Intel Family 6 Model 86
Running trigger ../trigger' Hardware event. This is not a software error. MCE 2 CPU 0 BANK 2 MISC 0 ADDR 1e9f000 TIME 946951762 Tue Jan 4 02:09:22 2000 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error STATUS 8c000000000000b0 MCGSTATUS 0 MCGCAP 1000c16 APICID 0 SOCKETID 0 PPIN 8c000000000000b0 CPUID Vendor Intel Family 6 Model 86 Running trigger ../trigger'
Hardware event. This is not a software error.
MCE 3
CPU 0 BANK 2
MISC 0 ADDR 760f000
TIME 946951762 Tue Jan 4 02:09:22 2000
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR
Transaction: Address/Command error
STATUS 8c000000000000b0 MCGSTATUS 0
MCGCAP 1000c16 APICID 0 SOCKETID 0
PPIN 8c000000000000b0
CPUID Vendor Intel Family 6 Model 86
Running trigger ../trigger' mcelog: Too many trigger children running already Hardware event. This is not a software error. MCE 4 CPU 0 BANK 2 MISC 0 ADDR 7d7d000 TIME 946951762 Tue Jan 4 02:09:22 2000 MCG status: MCi status: Corrected error MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error STATUS 8c000000000000b0 MCGSTATUS 0 MCGCAP 1000c16 APICID 0 SOCKETID 0 PPIN 8c000000000000b0 CPUID Vendor Intel Family 6 Model 86 Running trigger ../trigger'
mcelog: Too many trigger children running already
Hardware event. This is not a software error.
MCE 5
CPU 0 BANK 2
MISC 0 ADDR 5381000
TIME 946951762 Tue Jan 4 02:09:22 2000
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR
Transaction: Address/Command error
STATUS 8c000000000000b0 MCGSTATUS 0
MCGCAP 1000c16 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 86
[root@H3C page]#
Can you tell me why page-soft-then-hard.conf write # trigger: 4 ? I think it inject 6 errors. so why trigger is 4 ? The page-soft-then-hard.log show there are 6 errors , hope for your replay. Thanks very much ! @andikleen
@lumanyu180 From your log " cat page-soft-then-hard.log", I can only see that 3 errors injected and the script was triggered for 3 times. The number of injected errors matched the number of triggering. So I don't see the mismatch happen. Did you paste the full log? -Qiuxu
@lumanyu180 From your log " cat page-soft-then-hard.log", I can only see that 3 errors injected and the script was triggered for 3 times. The number of injected errors matched the number of triggering. So I don't see the mismatch happen. Did you paste the full log? -Qiuxu
Hi, Thanks for your reply. I past the full log . I think there are 6 times "Running trigger ../trigger" and MCE 0 , MCE 1, MCE 2, MCE 3, MCE 4, MCE 5, also 6 times . So i think it inject 6 errors ,and triggers are also 6 times。