kernel
kernel copied to clipboard
SB3: system total power limit at 25 watts, including display backlight
This is observed by using s-tui
to continuously monitor the power consumption, core frequency and utilization while compiling the kernel.
Running a "cold start", the system power peaks at 40 watts, and then drops to 25 watts (could be a thermal trip point, or a "long time power limit"?), where the CPU package draws 16 watts, resulting in all-core frequency at 1.6GHz. Core temperature stabilizes at 71C.
Curiously, I lowered the screen brightness to 5%, and the CPU package starts to draw more power (20+ watts) and all-core frequency goes up to 2.0GHz. Core temperature stabilizes at 78C.
So this does not solely look like a thermal throttle -- but where does that 25W limit come from?
Could be some total system limit for sustained power imposed by firmware. Did you try changing the performance mode?
I don't have "performance mode" -- there's no /sys/bus/surface_aggregator/devices/01:03:01:00:01/perf_mode
.
I did try all the profiles -- no difference it seems.
Yeah sorry, meant platform profiles (those and performance modes are the same).
Edit: see below.
~~After some more experiment with a stress tool I find it's thermal-related and not bound to a sustained loaded time. The peak power could sustain for 2 seconds, 5 seconds and so on.~~
~~And the power limit kicks in when the core goes over 90C degrees.~~ I'll see if I can work out a thermald configuration for this model.
Just feel this is funny that I should share the story with you guys :D
Have to reopen this because thermald/rapl is not effective against this problem. It's like a power budget, not thermal throttle -- the same 25 watts can well push the core to 80C (dimmed screen) but a thermal limit at 70C (like ~20 watts in total) already triggers the power limit!
btw, the fan is not as fast (loud) as in Windows. A profile configuration issue?
So, is there a way I can intercept SAM messages under windows?
btw, the fan is not as fast (loud) as in Windows. A profile configuration issue?
AFAIK the fan is entirely controlled by SAM, and the only influence we or Windows can have on that is the platform profile/performance mode. Try setting that to "best performance".
So, is there a way I can intercept SAM messages under windows?
Yes, although it requires a bit of configuration and the dumps are not easily readable (you're of course welcome to write a better parser for that). There's an overview on how to do that here: https://github.com/linux-surface/surface-aggregator-module/wiki/Development. A script for transforming the output into json files is here. Documentation on the commands we know so far is here. An overall documentation on the protocol can be found at https://www.kernel.org/doc/html/latest/driver-api/surface_aggregator/ssh.html.
Here are the two related entries when I toggle between Better Performance vs. Best Performance:
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 42}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 83, "rqid_hi": 3, "cid": 3}, "payload": [3, 0, 0, 0]}
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 46}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 87, "rqid_hi": 3, "cid": 3}, "payload": [4, 0, 0, 0]}
...which matches the documented values. le32 though?
Just checked the module. le32 indeed no problem.
Yeah, le32 is correct.
All multi-byte values in the protocol are little-endian.
Let me check if profile get can return the correct answer. If so, the problem may be somewhere else?
I'm fairly sure that getting/setting profiles is working correctly. There may be other commands for the EC that I've missed as I've mostly developed this on the SB2 and through reverse engineering of the Windows drivers. But it's also possible that there are other factors in play besides the EC.
Yeah, I just checked. Performance profile readback is correct. The capture log is a huge stream of category 21(0x15) plus these two 0x03, and a few 0x01. Maybe I should also check the 0x01 ones. (Rebooting)
oh and this: warning: expected SYN, skipping data until next SYN
Could be a bug in the parser or missing/corrupted data. Essentially this says that it expected the start of a message but got something else, so it's looking for the next start of a message.
With only cat 21 filtered:
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 42}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 83, "rqid_hi": 3, "cid": 3}, "payload": [3, 0, 0, 0]}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 42}}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 43}}
{"ctrl": {"type": 128, "len": 8, "pad": 0, "seq": 44}, "cmd": {"type": 128, "tc": 1, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 85, "rqid_hi": 3, "cid": 51}, "payload": []}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 44}}
{"ctrl": {"type": 128, "len": 8, "pad": 0, "seq": 45}, "cmd": {"type": 128, "tc": 1, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 86, "rqid_hi": 3, "cid": 52}, "payload": []}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 45}}
{"ctrl": {"type": 128, "len": 9, "pad": 0, "seq": 226}, "cmd": {"type": 128, "tc": 1, "outgoing": 0, "incoming": 1, "iid": 0, "rqid_lo": 86, "rqid_hi": 3, "cid": 52}, "payload": [0]}
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 46}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 87, "rqid_hi": 3, "cid": 3}, "payload": [4, 0, 0, 0]}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 46}}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 47}}
Message 43 and 47 are missing. Intriguing!
The bug is caused probably because IRPComp piggyback'ed another HID message
This is message 42:
ID = 103109
Time = 2/18/2022 7:02:29 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002959B4AA0
File object = 0xFFFF9002B1EF5DF0
IRP flags = 0x0
Length = L: 22
Key = K: 0x0
Offset = O: 0x0
Thread ID = 10608
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 22
Data (Hexer)
00: aa 55 80 0c 00 2a b1 a9 80 03 01 00 00 53 03 03 ªU?..*±©?....S..
10: 03 00 00 00 1c b0 .....°
And a few lines down, there's 0x2b, message 43:
ID = 103113
Time = 2/18/2022 7:02:29 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002B85D5A20
File object = 0xFFFF9002B1EF5DF0
IRP flags = 0x0
Length = L: 19
Key = K: 0x0
Offset = O: 0x0
Thread ID = 10608
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 19
Data (Hexer)
00: aa 55 80 09 00 2b 60 52 80 17 02 00 01 54 03 01 ªU?..+`R?....T..
10: 00 75 42 .uB
Also, message 47 pairs with 46:
ID = 103481
Time = 2/18/2022 7:02:31 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002B12C24A0
File object = 0xFFFF9002B93E05F0
IRP flags = 0x0
Length = L: 19
Key = K: 0x0
Offset = O: 0x0
Thread ID = 13296
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 19
Data (Hexer)
00: aa 55 80 09 00 2f e4 12 80 17 02 00 01 58 03 01 ªU?../ä.?....X..
10: 00 47 0d .G.
0x17 BKL
🤔
Yeah, unfortunately the json converter script isn't very robust. I had the plan to write a better decoder at some point but never found the time for it...
Any ideas on the BKL message? It could be a random coincidence but intriguing :p
BKL could mean backlight maybe? Just a guess though.
Never seen that one before
Makes sense, could be backlight related stuff since I'm adjusting power profiles. I'm wondering if the module handles the piggybacks and incomplete frames?
Not sure what you mean by piggybacks. It's an exclusive UART channel on the driver side (IRPMon might pick up other stuff that's not sent via that channel). If we receive an incomplete or invalid frame. the driver rejects it (by sending the appropriate packet with type 0x04
) and skips to the next SYN bytes (aa 55
). The EC does the same on its side.
The communication above is from host to EC though, so it's host initiated. We probably won't see any of the TC=0x17 messages unless we either send them ourselves or we set up the appropriate events (in case that subsystem can actually send events).
If you want to play around, you can load the debug module on linux (modprobe surface_aggregator_cdev
) and use the ctrl.py
script to talk to the EC. So you could send the messages from your log.
aha. pretend we are 0x17 experts. 😎
By piggyback I mean, for example in this reply:
00: aa 55 40 00 00 2a 74 6f ff ff aa 55 00 26 00 df ªU@..*toÿÿªU.&.ß
10: 34 8a 80 15 00 02 03 15 00 00 04 01 21 05 39 02 4??.........!.9.
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
It first ack'ed 0x2a, and then started another message..
aha. pretend we are 0x17 experts.
Pretty much xD
Should be ./ctrl.py request 0x17 0x02 0x01 0x01 0x01 0x00
, or TC=0x17, TID=0x02, CID=0x01, IID=0x01, flags=sync (aka. 0x01) and payload = 0x00.