kernel icon indicating copy to clipboard operation
kernel copied to clipboard

SB3: system total power limit at 25 watts, including display backlight

Open yatli opened this issue 3 years ago • 55 comments

This is observed by using s-tui to continuously monitor the power consumption, core frequency and utilization while compiling the kernel.

Running a "cold start", the system power peaks at 40 watts, and then drops to 25 watts (could be a thermal trip point, or a "long time power limit"?), where the CPU package draws 16 watts, resulting in all-core frequency at 1.6GHz. Core temperature stabilizes at 71C.

Curiously, I lowered the screen brightness to 5%, and the CPU package starts to draw more power (20+ watts) and all-core frequency goes up to 2.0GHz. Core temperature stabilizes at 78C.

So this does not solely look like a thermal throttle -- but where does that 25W limit come from?

yatli avatar Feb 18 '22 14:02 yatli

Could be some total system limit for sustained power imposed by firmware. Did you try changing the performance mode?

qzed avatar Feb 18 '22 14:02 qzed

I don't have "performance mode" -- there's no /sys/bus/surface_aggregator/devices/01:03:01:00:01/perf_mode. I did try all the profiles -- no difference it seems.

yatli avatar Feb 18 '22 14:02 yatli

Yeah sorry, meant platform profiles (those and performance modes are the same).

qzed avatar Feb 18 '22 14:02 qzed

Edit: see below.

~~After some more experiment with a stress tool I find it's thermal-related and not bound to a sustained loaded time. The peak power could sustain for 2 seconds, 5 seconds and so on.~~

~~And the power limit kicks in when the core goes over 90C degrees.~~ I'll see if I can work out a thermald configuration for this model.

yatli avatar Feb 18 '22 14:02 yatli

Just feel this is funny that I should share the story with you guys :D

yatli avatar Feb 18 '22 15:02 yatli

Have to reopen this because thermald/rapl is not effective against this problem. It's like a power budget, not thermal throttle -- the same 25 watts can well push the core to 80C (dimmed screen) but a thermal limit at 70C (like ~20 watts in total) already triggers the power limit!

yatli avatar Feb 18 '22 16:02 yatli

btw, the fan is not as fast (loud) as in Windows. A profile configuration issue?

yatli avatar Feb 18 '22 16:02 yatli

So, is there a way I can intercept SAM messages under windows?

yatli avatar Feb 18 '22 17:02 yatli

btw, the fan is not as fast (loud) as in Windows. A profile configuration issue?

AFAIK the fan is entirely controlled by SAM, and the only influence we or Windows can have on that is the platform profile/performance mode. Try setting that to "best performance".

So, is there a way I can intercept SAM messages under windows?

Yes, although it requires a bit of configuration and the dumps are not easily readable (you're of course welcome to write a better parser for that). There's an overview on how to do that here: https://github.com/linux-surface/surface-aggregator-module/wiki/Development. A script for transforming the output into json files is here. Documentation on the commands we know so far is here. An overall documentation on the protocol can be found at https://www.kernel.org/doc/html/latest/driver-api/surface_aggregator/ssh.html.

qzed avatar Feb 18 '22 18:02 qzed

Here are the two related entries when I toggle between Better Performance vs. Best Performance:

{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 42}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 83, "rqid_hi": 3, "cid": 3}, "payload": [3, 0, 0, 0]}
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 46}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 87, "rqid_hi": 3, "cid": 3}, "payload": [4, 0, 0, 0]}

...which matches the documented values. le32 though?

yatli avatar Feb 18 '22 19:02 yatli

Just checked the module. le32 indeed no problem.

yatli avatar Feb 18 '22 19:02 yatli

Yeah, le32 is correct.

qzed avatar Feb 18 '22 19:02 qzed

All multi-byte values in the protocol are little-endian.

qzed avatar Feb 18 '22 19:02 qzed

Let me check if profile get can return the correct answer. If so, the problem may be somewhere else?

yatli avatar Feb 18 '22 19:02 yatli

I'm fairly sure that getting/setting profiles is working correctly. There may be other commands for the EC that I've missed as I've mostly developed this on the SB2 and through reverse engineering of the Windows drivers. But it's also possible that there are other factors in play besides the EC.

qzed avatar Feb 18 '22 19:02 qzed

Yeah, I just checked. Performance profile readback is correct. The capture log is a huge stream of category 21(0x15) plus these two 0x03, and a few 0x01. Maybe I should also check the 0x01 ones. (Rebooting)

yatli avatar Feb 18 '22 19:02 yatli

oh and this: warning: expected SYN, skipping data until next SYN

yatli avatar Feb 18 '22 19:02 yatli

Could be a bug in the parser or missing/corrupted data. Essentially this says that it expected the start of a message but got something else, so it's looking for the next start of a message.

qzed avatar Feb 18 '22 19:02 qzed

With only cat 21 filtered:

{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 42}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 83, "rqid_hi": 3, "cid": 3}, "payload": [3, 0, 0, 0]}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 42}}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 43}}
{"ctrl": {"type": 128, "len": 8, "pad": 0, "seq": 44}, "cmd": {"type": 128, "tc": 1, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 85, "rqid_hi": 3, "cid": 51}, "payload": []}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 44}}
{"ctrl": {"type": 128, "len": 8, "pad": 0, "seq": 45}, "cmd": {"type": 128, "tc": 1, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 86, "rqid_hi": 3, "cid": 52}, "payload": []}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 45}}
{"ctrl": {"type": 128, "len": 9, "pad": 0, "seq": 226}, "cmd": {"type": 128, "tc": 1, "outgoing": 0, "incoming": 1, "iid": 0, "rqid_lo": 86, "rqid_hi": 3, "cid": 52}, "payload": [0]}
{"ctrl": {"type": 128, "len": 12, "pad": 0, "seq": 46}, "cmd": {"type": 128, "tc": 3, "outgoing": 1, "incoming": 0, "iid": 0, "rqid_lo": 87, "rqid_hi": 3, "cid": 3}, "payload": [4, 0, 0, 0]}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 46}}
{"ctrl": {"type": 64, "len": 0, "pad": 0, "seq": 47}}

Message 43 and 47 are missing. Intriguing!

yatli avatar Feb 18 '22 20:02 yatli

The bug is caused probably because IRPComp piggyback'ed another HID message

This is message 42:

ID = 103109
Time = 2/18/2022 7:02:29 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002959B4AA0
File object = 0xFFFF9002B1EF5DF0
IRP flags = 0x0
Length = L: 22
Key = K: 0x0
Offset = O: 0x0
Thread ID = 10608
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 22
Data (Hexer)
  00:	aa 55 80 0c 00 2a b1 a9 80 03 01 00 00 53 03 03	ªU?..*±©?....S..
  10:	03 00 00 00 1c b0                              	.....°          

And a few lines down, there's 0x2b, message 43:

ID = 103113
Time = 2/18/2022 7:02:29 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002B85D5A20
File object = 0xFFFF9002B1EF5DF0
IRP flags = 0x0
Length = L: 19
Key = K: 0x0
Offset = O: 0x0
Thread ID = 10608
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 19
Data (Hexer)
  00:	aa 55 80 09 00 2b 60 52 80 17 02 00 01 54 03 01	ªU?..+`R?....T..
  10:	00 75 42                                       	.uB             

Also, message 47 pairs with 46:

ID = 103481
Time = 2/18/2022 7:02:31 PM
Type = IRP
Device object = 0xFFFF9002959EBB70
Device name = \Device\00000040
Driver object = 0xFFFF900298294DF0
Driver name = \Driver\iaLPSS2_UART2
Result value = 0x103
Result constant = STATUS_PENDING
Major function = Write
Minor function = Normal
IRP address = 0xFFFF9002B12C24A0
File object = 0xFFFF9002B93E05F0
IRP flags = 0x0
Length = L: 19
Key = K: 0x0
Offset = O: 0x0
Thread ID = 13296
Process ID = 4
IRQL = Passive
Previous mode = KernelMode
Requestor mode = KernelMode
IOSB.Status value = 0x0
IOSB.Status constant = STATUS_SUCCESS
IOSB.Information = 0 (0x0000000000000000)
Requestor PID = 0
Emulated = False
Associated data = True
Data stripped = False
Data size = 19
Data (Hexer)
  00:	aa 55 80 09 00 2f e4 12 80 17 02 00 01 58 03 01	ªU?../ä.?....X..
  10:	00 47 0d                                       	.G.             

0x17 BKL

🤔

yatli avatar Feb 18 '22 20:02 yatli

Yeah, unfortunately the json converter script isn't very robust. I had the plan to write a better decoder at some point but never found the time for it...

qzed avatar Feb 18 '22 20:02 qzed

Any ideas on the BKL message? It could be a random coincidence but intriguing :p

yatli avatar Feb 18 '22 20:02 yatli

BKL could mean backlight maybe? Just a guess though.

qzed avatar Feb 18 '22 20:02 qzed

Never seen that one before

qzed avatar Feb 18 '22 20:02 qzed

Makes sense, could be backlight related stuff since I'm adjusting power profiles. I'm wondering if the module handles the piggybacks and incomplete frames?

yatli avatar Feb 18 '22 20:02 yatli

Not sure what you mean by piggybacks. It's an exclusive UART channel on the driver side (IRPMon might pick up other stuff that's not sent via that channel). If we receive an incomplete or invalid frame. the driver rejects it (by sending the appropriate packet with type 0x04) and skips to the next SYN bytes (aa 55). The EC does the same on its side.

The communication above is from host to EC though, so it's host initiated. We probably won't see any of the TC=0x17 messages unless we either send them ourselves or we set up the appropriate events (in case that subsystem can actually send events).

qzed avatar Feb 18 '22 20:02 qzed

If you want to play around, you can load the debug module on linux (modprobe surface_aggregator_cdev) and use the ctrl.py script to talk to the EC. So you could send the messages from your log.

qzed avatar Feb 18 '22 20:02 qzed

aha. pretend we are 0x17 experts. 😎

yatli avatar Feb 18 '22 20:02 yatli

By piggyback I mean, for example in this reply:

  00:	aa 55 40 00 00 2a 74 6f ff ff aa 55 00 26 00 df	ªU@..*toÿÿªU.&.ß
  10:	34 8a 80 15 00 02 03 15 00 00 04 01 21 05 39 02	4??.........!.9.
  20:	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00	................

It first ack'ed 0x2a, and then started another message..

yatli avatar Feb 18 '22 20:02 yatli

aha. pretend we are 0x17 experts.

Pretty much xD

Should be ./ctrl.py request 0x17 0x02 0x01 0x01 0x01 0x00, or TC=0x17, TID=0x02, CID=0x01, IID=0x01, flags=sync (aka. 0x01) and payload = 0x00.

qzed avatar Feb 18 '22 20:02 qzed