op-build icon indicating copy to clipboard operation
op-build copied to clipboard

On witherspoon system seeing failures when setting special wakeup on some cores

Open pridhiviraj opened this issue 7 years ago • 10 comments

This is with latest op-build upstream and where stop11 is enabled on it.

sensors
ibmpowernv-isa-0000
Adapter: ISA adapter
Chip 0 Vdd Remote Sense: +10.40 V  (lowest =  +7.68 V, highest = +10.41 V)
Chip 0 Vdn Remote Sense:  +9.02 V  (lowest =  +9.02 V, highest =  +9.02 V)
Chip 8 Vdd Remote Sense: +10.14 V  (lowest =  +6.56 V, highest = +10.15 V)
Chip 8 Vdn Remote Sense:  +9.02 V  (lowest =  +9.02 V, highest =  +9.02 V)
Chip 0 Vdd:              +10.43 V  (lowest =  +7.98 V, highest = +10.43 V)
Chip 0 Vdn:               +9.03 V  (lowest =  +9.03 V, highest =  +9.03 V)
Chip 8 Vdd:              +10.16 V  (lowest =  +6.59 V, highest = +10.16 V)
Chip 8 Vdn:               +9.03 V  (lowest =  +9.03 V, highest =  +9.03 V)
Core 0:                    +0.0°C  
Core 4:                    +0.0°C  
Core 8:                    +0.0°C  
Core 12:                   +0.0°C  
Core 16:                  +41.0°C  
Core 20:                   +0.0°C  
Core 24:                   +0.0°C  
Core 28:                   +0.0°C  
Core 32:                   +0.0°C  
Core 36:                   +0.0°C  
[  509.391455887,3] Could not set special wakeup on 0:16: timeout waiting for SPECIAL_WKUP_DONE.
[  509.391546781,3] Failed to set special wakeup on 64 (-6)
[   63.164807] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp11_input: I/O error
Core 40:                   +0.0°C  
[  509.443317815,3] Could not set special wakeup on 0:17: timeout waiting for SPECIAL_WKUP_DONE.
[  509.446118428,3] Failed to set special wakeup on 68 (-6)
[   63.271434] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp12_input: I/O error
Core 44:                      N/A  
[  509.497917963,3] Could not set special wakeup on 0:18: timeout waiting for SPECIAL_WKUP_DONE.
[  509.499181905,3] Failed to set special wakeup on 72 (-6)
[   63.382006] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp13_input: I/O error
Core 48:                      N/A  
[  510.042539230,3] Could not set special wakeup on 0:19: timeout waiting for SPECIAL_WKUP_DONE.
[  510.044259811,3] Failed to set special wakeup on 76 (-6)
[   63.488888] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp14_input: I/O error
Core 52:                      N/A  
[  510.097536370,3] Could not set special wakeup on 0:20: timeout waiting for SPECIAL_WKUP_DONE.
[  510.098196905,3] Failed to set special wakeup on 80 (-6)
[   63.596496] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp15_input: I/O error
Core 56:                   +0.0°C  
^[[A[  510.152311731,3] Could not set special wakeup on 0:21: timeout waiting for SPECIAL_WKUP_DONE.
[  510.152397002,3] Failed to set special wakeup on 84 (-6)
[   63.697695] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp16_input: I/O error
Core 60:                      N/A  
[  510.204162752,3] Could not set special wakeup on 0:22: timeout waiting for SPECIAL_WKUP_DONE.
[  510.204250849,3] Failed to set special wakeup on 88 (-6)
[   63.798991] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp17_input: I/O error
Core 64:                      N/A  
[  510.256048019,3] Could not set special wakeup on 0:23: timeout waiting for SPECIAL_WKUP_DONE.
[  510.256152845,3] Failed to set special wakeup on 92 (-6)
[   63.900347] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp18_input: I/O error
Core 68:                      N/A  

[  510.311098231,3] Could not set special wakeup on 8:0: timeout waiting for SPECIAL_WKUP_DONE.
[  510.311214156,3] Failed to set special wakeup on 2048 (-6)
[   64.007924] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp19_input: I/O error
Core 72:                   +0.0°C  
[  510.363058719,3] Could not set special wakeup on 8:1: timeout waiting for SPECIAL_WKUP_DONE.
[  510.363165908,3] Failed to set special wakeup on 2052 (-6)
[   64.109363] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp20_input: I/O error
Core 76:                      N/A  
[  510.415001148,3] Could not set special wakeup on 8:2: timeout waiting for SPECIAL_WKUP_DONE.
[  510.415902035,3] Failed to set special wakeup on 2056 (-6)
[   64.213710] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp21_input: I/O error
Core 80:                      N/A  
[  510.468410188,3] Could not set special wakeup on 8:3: timeout waiting for SPECIAL_WKUP_DONE.
[  510.468503404,3] Failed to set special wakeup on 2060 (-6)
[   64.315085] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp22_input: I/O error
Core 84:                      N/A  
[  511.008321447,3] Could not set special wakeup on 8:4: timeout waiting for SPECIAL_WKUP_DONE.
[  511.009559247,3] Failed to set special wakeup on 2064 (-6)
[   64.420073] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp23_input: I/O error
Core 88:                      N/A  
[  511.062038415,3] Could not set special wakeup on 8:5: timeout waiting for SPECIAL_WKUP_DONE.
[  511.062137428,3] Failed to set special wakeup on 2068 (-6)
[   64.521411] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp24_input: I/O error
Core 92:                      N/A  
[  511.113963228,3] Could not set special wakeup on 8:8: timeout waiting for SPECIAL_WKUP_DONE.
[  511.114645263,3] Failed to set special wakeup on 2080 (-6)
[   64.625324] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp25_input: I/O error
Core 96:                      N/A  
[  511.167170362,3] Could not set special wakeup on 8:9: timeout waiting for SPECIAL_WKUP_DONE.
[  511.167263396,3] Failed to set special wakeup on 2084 (-6)
[   64.726724] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp26_input: I/O error
Core 100:                     N/A  
[  511.218941943,3] Could not set special wakeup on 8:12: timeout waiting for SPECIAL_WKUP_DONE.
[  511.219033195,3] Failed to set special wakeup on 2096 (-6)
[   64.827851] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp27_input: I/O error
Core 104:                     N/A  
[  511.270864234,3] Could not set special wakeup on 8:13: timeout waiting for SPECIAL_WKUP_DONE.
[  511.270944090,3] Failed to set special wakeup on 2100 (-6)
[   64.931555] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp28_input: I/O error
Core 108:                     N/A  
[  511.323922142,3] Could not set special wakeup on 8:14: timeout waiting for SPECIAL_WKUP_DONE.
[  511.325586573,3] Failed to set special wakeup on 2104 (-6)
[   65.038277] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp29_input: I/O error
Core 112:                     N/A  
[  511.378599389,3] Could not set special wakeup on 8:15: timeout waiting for SPECIAL_WKUP_DONE.
[  511.380030909,3] Failed to set special wakeup on 2108 (-6)
[   65.144619] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp30_input: I/O error
Core 116:                     N/A  
[  511.433029256,3] Could not set special wakeup on 8:16: timeout waiting for SPECIAL_WKUP_DONE.
[  511.435461948,3] Failed to set special wakeup on 2112 (-6)
[   65.252906] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp31_input: I/O error
Core 120:                     N/A  
[  511.488460724,3] Could not set special wakeup on 8:17: timeout waiting for SPECIAL_WKUP_DONE.
[  511.488562870,3] Failed to set special wakeup on 2116 (-6)
[   65.354289] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp32_input: I/O error
Core 124:                     N/A  
[  512.028252436,3] Could not set special wakeup on 8:18: timeout waiting for SPECIAL_WKUP_DONE.
[  512.028344705,3] Failed to set special wakeup on 2120 (-6)
[   65.455418] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp33_input: I/O error
Core 128:                     N/A  
[  512.080212413,3] Could not set special wakeup on 8:19: timeout waiting for SPECIAL_WKUP_DONE.
[  512.080317398,3] Failed to set special wakeup on 2124 (-6)
[   65.556920] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp34_input: I/O error
Core 132:                     N/A  
Core 136:                  +0.0°C  
[  512.132480766,3] Could not set special wakeup on 8:23: timeout waiting for SPECIAL_WKUP_DONE.
[  512.132580127,3] Failed to set special wakeup on 2140 (-6)
[   65.659009] opal: opal_error_code: unexpected OPAL error -14
ERROR: Can't get value of subfeature temp36_input: I/O error
Core 140:                     N/A  
Chip 0 Core 0:            +39.0°C  (lowest = +32.0°C, highest = +54.0°C)
Chip 0 Core 4:            +39.0°C  (lowest = +33.0°C, highest = +46.0°C)
Chip 0 Core 8:            +39.0°C  (lowest = +33.0°C, highest = +48.0°C)
Chip 0 Core 12:           +39.0°C  (lowest = +33.0°C, highest = +48.0°C)
Chip 0 Core 16:           +37.0°C  (lowest = +33.0°C, highest = +48.0°C)
Chip 0 Core 20:           +37.0°C  (lowest = +33.0°C, highest = +46.0°C)
Chip 0 Core 24:           +39.0°C  (lowest = +34.0°C, highest = +49.0°C)
Chip 0 Core 28:           +39.0°C  (lowest = +34.0°C, highest = +50.0°C)
Chip 0 Core 32:           +39.0°C  (lowest = +34.0°C, highest = +48.0°C)
Chip 0 Core 36:           +39.0°C  (lowest = +34.0°C, highest = +49.0°C)
Chip 0 Core 40:           +39.0°C  (lowest = +33.0°C, highest = +45.0°C)
Chip 0 Core 44:           +39.0°C  (lowest = +31.0°C, highest = +44.0°C)
Chip 0 Core 48:           +39.0°C  (lowest = +34.0°C, highest = +47.0°C)
Chip 0 Core 52:           +39.0°C  (lowest = +34.0°C, highest = +47.0°C)
Chip 0 Core 56:           +39.0°C  (lowest = +34.0°C, highest = +49.0°C)
Chip 0 Core 60:           +39.0°C  (lowest = +34.0°C, highest = +51.0°C)
Chip 0 Core 64:           +39.0°C  (lowest = +34.0°C, highest = +48.0°C)
Chip 0 Core 68:           +39.0°C  (lowest = +34.0°C, highest = +50.0°C)
Chip 8 Core 72:           +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 76:           +38.0°C  (lowest = +32.0°C, highest = +49.0°C)
Chip 8 Core 80:           +38.0°C  (lowest = +32.0°C, highest = +49.0°C)
Chip 8 Core 84:           +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 88:           +38.0°C  (lowest = +32.0°C, highest = +53.0°C)
Chip 8 Core 92:           +38.0°C  (lowest = +32.0°C, highest = +50.0°C)
Chip 8 Core 96:           +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 100:          +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 104:          +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 108:          +38.0°C  (lowest = +32.0°C, highest = +49.0°C)
Chip 8 Core 112:          +38.0°C  (lowest = +32.0°C, highest = +49.0°C)
Chip 8 Core 116:          +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 120:          +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 124:          +38.0°C  (lowest = +32.0°C, highest = +49.0°C)
Chip 8 Core 128:          +38.0°C  (lowest = +32.0°C, highest = +49.0°C)
Chip 8 Core 132:          +38.0°C  (lowest = +32.0°C, highest = +48.0°C)
Chip 8 Core 136:          +38.0°C  (lowest = +32.0°C, highest = +53.0°C)
Chip 8 Core 140:          +38.0°C  (lowest = +32.0°C, highest = +52.0°C)
Chip 0 GPU 0 :             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 GPU 1 :             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 GPU 2 :             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 GPU 0 MEM:          +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 GPU 1 MEM:          +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 GPU 2 MEM:          +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 GPU 0 :             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 GPU 1 :             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 GPU 2 :             +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 GPU 0 MEM:          +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 GPU 1 MEM:          +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 GPU 2 MEM:          +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 0 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 2 :           +34.0°C  (lowest = +29.0°C, highest = +34.0°C)
Chip 0 DIMM 3 :           +34.0°C  (lowest = +30.0°C, highest = +34.0°C)
Chip 0 DIMM 4 :           +35.0°C  (lowest = +30.0°C, highest = +35.0°C)
Chip 0 DIMM 5 :           +34.0°C  (lowest = +30.0°C, highest = +34.0°C)
Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 10 :          +34.0°C  (lowest = +29.0°C, highest = +34.0°C)
Chip 0 DIMM 11 :          +33.0°C  (lowest = +29.0°C, highest = +33.0°C)
Chip 0 DIMM 12 :          +33.0°C  (lowest = +29.0°C, highest = +33.0°C)
Chip 0 DIMM 13 :          +33.0°C  (lowest = +29.0°C, highest = +33.0°C)
Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 0 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 2 :           +33.0°C  (lowest = +30.0°C, highest = +33.0°C)
Chip 8 DIMM 3 :           +35.0°C  (lowest = +31.0°C, highest = +35.0°C)
Chip 8 DIMM 4 :           +35.0°C  (lowest = +31.0°C, highest = +35.0°C)
Chip 8 DIMM 5 :           +34.0°C  (lowest = +30.0°C, highest = +34.0°C)
Chip 8 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 10 :          +35.0°C  (lowest = +32.0°C, highest = +35.0°C)
Chip 8 DIMM 11 :          +35.0°C  (lowest = +32.0°C, highest = +35.0°C)
Chip 8 DIMM 12 :          +33.0°C  (lowest = +30.0°C, highest = +33.0°C)
Chip 8 DIMM 13 :          +36.0°C  (lowest = +33.0°C, highest = +36.0°C)
Chip 8 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 Nest:              +39.0°C  (lowest = +34.0°C, highest = +46.0°C)
Chip 8 Nest:              +37.0°C  (lowest = +32.0°C, highest = +44.0°C)
Chip 0 TEMPVDD:           +34.0°C  (lowest = +32.0°C, highest = +37.0°C)
Chip 8 TEMPVDD:           +32.0°C  (lowest = +30.0°C, highest = +35.0°C)
Chip 0 GPU:                4.00 W  (lowest =   0.00 W, highest =   5.00 W)
Chip 8 GPU:                3.00 W  (lowest =   0.00 W, highest =   4.00 W)
Chip 0 Memory:            31.00 W  (lowest =   0.00 W, highest =  52.00 W)
Chip 8 Memory:            30.00 W  (lowest =   0.00 W, highest =  41.00 W)
Chip 0 :                  41.00 W  (lowest =   0.00 W, highest = 249.00 W)
Chip 0 Vdd:               10.00 W  (lowest =   8.00 W, highest = 177.00 W)
Chip 0 Vdn:               18.00 W  (lowest =  17.00 W, highest =  21.00 W)
Chip 8 :                  35.00 W  (lowest =   0.00 W, highest = 210.00 W)
Chip 8 Vdd:                7.00 W  (lowest =   3.00 W, highest = 151.00 W)
Chip 8 Vdn:               19.00 W  (lowest =  18.00 W, highest =  20.00 W)
System:                  211.00 W  (lowest =   0.00 W, highest = 738.00 W)
APSS 0 :                 211.00 W  (lowest =   0.00 W, highest = 738.00 W)
APSS 1 :                   0.00 W  (lowest =   0.00 W, highest =   0.00 W)
APSS 2 :                   3.00 W  (lowest =   0.00 W, highest = 203.00 W)
APSS 3 :                 1000.00 mW (lowest =   0.00 W, highest = 171.00 W)
APSS 4 :                  39.00 W  (lowest =   0.00 W, highest =  47.00 W)
APSS 5 :                  35.00 W  (lowest =   0.00 W, highest =  42.00 W)
APSS 6 :                  31.00 W  (lowest =   0.00 W, highest =  52.00 W)
APSS 7 :                  30.00 W  (lowest =   0.00 W, highest =  41.00 W)
APSS 8 :                   2.00 W  (lowest =   0.00 W, highest =   2.00 W)
APSS 9 :                 1000.00 mW (lowest =   0.00 W, highest = 1000.00 mW)
APSS 10 :                1000.00 mW (lowest =   0.00 W, highest =   2.00 W)
APSS 11 :                1000.00 mW (lowest =   0.00 W, highest = 1000.00 mW)
APSS 12 :                1000.00 mW (lowest =   0.00 W, highest = 1000.00 mW)
APSS 13 :                1000.00 mW (lowest =   0.00 W, highest = 1000.00 mW)
APSS 14 :                 27.00 W  (lowest =   0.00 W, highest = 245.00 W)
APSS 15 :                 34.00 W  (lowest =   0.00 W, highest =  40.00 W)
Chip 0 Vdd:               +1.29 A  (lowest =  +0.79 A, highest = +17.77 A)
Chip 0 Vdn:               +2.02 A  (lowest =  +1.90 A, highest =  +2.35 A)
Chip 8 Vdd:               +0.79 A  (lowest =  +0.49 A, highest = +15.56 A)
Chip 8 Vdn:               +2.09 A  (lowest =  +2.02 A, highest =  +2.25 A)

pridhiviraj avatar Feb 13 '18 05:02 pridhiviraj

@stewart-ibm This is what earlier you mentioned in one of your previous test runs. To track the status of this i raised here.

pridhiviraj avatar Feb 13 '18 05:02 pridhiviraj

@Over-enthusiastic Can you update the status of this failure.

pridhiviraj avatar Feb 13 '18 05:02 pridhiviraj

@pridhiviraj Thanks for raising the bug. Affected systems - P9 DD2.x with stop11 enabled. The problem has been root caused. OPAL/Kernel patch is under internal discussion/review.

Over-enthusiastic avatar Feb 13 '18 08:02 Over-enthusiastic

A patch to temporarily disable stop11 while we figure it out https://github.com/open-power/op-build/pull/1874

ghost avatar Feb 15 '18 09:02 ghost

Fix for linux kernel posted here https://patchwork.ozlabs.org/patch/875157/ Since the fix is to be delivered through linux, we can possibly close this issue.

Over-enthusiastic avatar Feb 19 '18 15:02 Over-enthusiastic

@Over-enthusiastic With your patch i tested on a witherspoon system wihich is having latest PNOR with stop11 disabled in it. Still we are able to reproduce the issue.

/ # cat /sys//firmware/opal/msglog | grep -i stop*
[   65.459926521,7] XSTOP: XSCOM addr = 0x5012000, FIR bit = 31
[   70.106626618,5] SLW: Configuring self-restore for HRMOR
[   70.106715228,5] SLW: Configuring self-restore for HRMOR
[   70.106757656,5] SLW: Configuring self-restore for HRMOR
[   70.106799008,5] SLW: Configuring self-restore for NCU_SPEC_BAR
[   70.106859962,5] SLW: Configuring self-restore for P9X_EX_NCU_DARN_BAR
[   70.106912611,5] SLW: Enabling: stop0_lite
[   70.106945997,5] SLW: Enabling: stop0
[   70.106964009,5] SLW: Enabling: stop1_lite
[   70.106986801,5] SLW: Enabling: stop1
[   70.107006769,5] SLW: Enabling: stop2_lite
[   70.107027317,5] SLW: Enabling: stop2
[   70.107047406,5] SLW: Enabling: stop4
[   70.107066731,5] SLW: Enabling: stop5



watchdog: CPU 59 detected hard LOCKUP on other CPUs 0,32-33,64
[  203.134804127,3] Could not set special wakeup on 0:0: timeout waiting for SPECIAL_WKUP_DONE.
[  204.212137495,3] Could not set special wakeup on 0:8: timeout waiting for SPECIAL_WKUP_DONE.
[  205.296858020,3] Could not set special wakeup on 0:8: timeout waiting for SPECIAL_WKUP_DONE.
[  206.381755750,3] Could not set special wakeup on 0:20: timeout waiting for SPECIAL_WKUP_DONE.

PNOR is in below level which built from latest op-build.

cat /var/lib//phosphor-software-manager/pnor/ro/VERSION 
open-power-witherspoon-v1.21-rc2-8-gcf83faa-dirty
	buildroot-2017.11-5-g65679be
	skiboot-v5.10-rc3
	hostboot-28927a7
	linux-4.14.20-openpower1-pf1c13fc
	petitboot-v1.6.6-p7cfd0fc
	machine-xml-58554bf-p1f16afe
	occ-f72f857
	hostboot-binaries-6924d6b
	capp-ucode-p9-dd2-v3
	sbe-0aae9a8

pridhiviraj avatar Feb 20 '18 13:02 pridhiviraj

https://patchwork.ozlabs.org/patch/875157/ could fix this particular issue?

That's a kernel patch, so we'll need to work out a way to get stop11 enabled without breaking things... but perhaps we're okay at this stage if we get things into distros?

ghost avatar Feb 28 '18 09:02 ghost

Currently, as per op-build v1.21-28-gcd1f3d0c7c0f - we have stop4 and stop5 disabled, but new hcode hw022318a.911. The stop4/stop5 disabled is because of lockups around stop4/stop5 and maybe special wakeup (i never really narrowed it down) - but https://github.com/open-power/op-build/pull/1916 contains the mad attempting-to-narrow-it-down day that was yesterday.

ghost avatar Feb 28 '18 09:02 ghost

@stewart-ibm Commit https://github.com/open-power/op-build/commit/036e1bef0c626dce5b8a13990a9e722ecee1c4b7
Adds the HCODE fix for special wakeup issue... We should be good to enable stop4 and stop5 again.

Over-enthusiastic avatar Mar 20 '18 09:03 Over-enthusiastic

Yeah, i have tested the upstream op-build(including that hcode fix commit) with enabling stop 4 and 5 for witherspoon platform, test runs looks stable and no special wakeup related problems occur.

pridhiviraj avatar Mar 20 '18 14:03 pridhiviraj