kernel icon indicating copy to clipboard operation
kernel copied to clipboard

SP3 - i2c_designware devices spam interrupts, reducing performance and battery life

Open jkatzmewing opened this issue 4 years ago • 11 comments

(Currently Xubuntu 21.04, with latest linux-surface kernel)

My Surface Pro 3 performs really badly out of the box with any Linux distro, with lots of micro-stutter, applications taking a long time to start up, lots of heat and short battery life... Looking at powertop, I found that interrupts for the device INT33C2:00 were eating a huge amount of power, often 4 or 5 watts.

Blacklisting i2c_hid helped performance and got rid of the INT33C2 interrupts, but performance was still not great. Trying the linux-surface kernel instead of the Ubuntu generic or lowlatency ones, things were even worse - when i2c_hid wasn't blacklisted, INT33C2 interrupts ate up 16 watts, keeping the SP3's fans constantly on blast.

Looking at dmesg I noted that INT33C2 was actually powered by the (built in) i2c_designware driver, and disabled that as detailed here:

https://unix.stackexchange.com/questions/423797/how-do-i-disable-i2c-designware-support-when-its-not-built-as-a-module

On reboot, the touchscreen did not work (as expect and as with blacklisting i2c_hid), but performance was much better, and estimated battery life increased by almost an hour.

jkatzmewing avatar Jun 04 '21 15:06 jkatzmewing

Is it possible that the touchscreen is misbehaving and causing those interrupts? This kinda looks like a device misbehaving to me, I'm not entirely sure how to debug this.

qzed avatar Jun 05 '21 19:06 qzed

Yes, it could definitely be. I'll try some further investigation and see if I can narrow this down a bit.

jkatzmewing avatar Jun 05 '21 19:06 jkatzmewing

I can confirm this is also occurring on my SP3 - looks like the ambient light sensor INT33C2:00 is at fault.

`

Top 10 Power Consumers

Usage Events/s Category Description PW Estimate
0.2% 168.0 kWork dbs_work_handler 667 mW
0.2% 149.3 Timer tick_sched_timer 593 mW
0.2% 104.5 Interrupt [7] INT33C2:00 416 mW
0.3% 77.1 Process [PID 6167] /opt/brave.com/brave/brave --high-dpi-support=1 --force-device-scale-factor=1.6 348 mW
0.8% 28.2 Process [PID 12463] baloo_file_extr 133 mW
0.3% 24.4 Timer hrtimer_wakeup 101 mW
0.1% 24.2 Process [PID 12465] QDBusConnection 97.4 mW
0.0% 23.8 Interrupt [4] block(softirq) 94.9 mW
0.6% 20.4 Process [PID 1979] /usr/bin/latte-dock -session 101751bb1ba17d000162324977800000017790013_1623406482_240294 89.4 mW
0.3% 20.4 kWork mwifiex_main_work_queue 84.8 mW

`

RussH avatar Jun 14 '21 11:06 RussH

INT33C2 is a I2C controller that's used for multiple I2C clients. Those seem to be (according to the DSDT but you can check that in /sys/bus/i2c/devices/... as well):

  • MSHW0028 (VGBI) volume and power buttons (?)
  • INT33CA (ACD0) Intel SPB Peripheral (something related to audio)
  • INT33C9 (ACD1) Wolfson Microelectronics Audio WM5102 (something else related to audio)
  • INT33CB (ACD2) Intel Smart Sound Technology Audio Codec (yet another audio thing)
  • INT33D1 (SHUB) Intel GPIO Buttons (more buttons?)
  • INT33D7 (DFUD) no clue
  • MSFT1111 (TPD4) some HID-over-I2C device (touchscreen maybe?)
  • MSHW0030 (SAM) SAM v1 as HID-over-I2C device

So it's either the controller that's at fault or some of those client devices constantly want to talk to that controller and don't shut up (which is why I suspected the touchscreen).

qzed avatar Jun 14 '21 12:06 qzed

So, maybe interesting update - unlike with Ubuntu, Fedora 34's stock kernels seem much less affected by this. INT33C2 interrupts are still numerous, but take much less CPU time; the tablet runs cooler and the fans do not run on full blast when the touchscreen is enabled. Also less mouse lag, and wakeups/second in powertop stays under 1000 during normal desktop use.

The same unfortunately can't be said for the linux-surface kernel for Fedora, which has the same issue as on Ubuntu, and as the Ubuntu stock kernel. Mouse lags visibly, high wakeups/second, INT33C2 consistently using more like 40 ms/s instead of 2.5.

So I'm guessing this is down to some kernel config option(s). Not sure what, though.

Edit: also to be clear we're still not talking "completely unaffected". Powertop still shows INT33C2 interrupts spiking at times, usually when launching Electron applications - sometimes spiking up to 8000/s or so, vs. 180/s or so normally.

jkatzmewing avatar Jun 29 '21 15:06 jkatzmewing

Interesting, most config options should be the same as on Fedora. Can you try unbinding the drivers for the devices I mentioned above (specifically HID ones since you mentioned that blacklisting i2c_hid influences the behavior) and check if that makes a difference?

That should work e.g. via echo <device-name> | sudo tee /sys/bus/i2c/devices/<device-name>/driver/unbind where <device-name> is the name of the device in /sys/bus/i2c/devices/. You might need to read the HID of the device to match it to the table above (if the name doesn't give that away), which you can do via cat /sys/bus/i2c/devices/<device-name>/firmware_node/hid.

qzed avatar Jun 30 '21 21:06 qzed

@qzed I'll give that a try this evening thanks!

jkatzmewing avatar Jun 30 '21 21:06 jkatzmewing

@qzed

/sys/bus/i2c/devices/<device-name>/firmware_node/hid is LNXVIDEO for i2c-2 through i2c-7. i2c-8 and i2c-9 have no hid file. i2c-1 is INT33C3, and i2c-0 is INT33C2.

i2c-MSHW0028:00 doesn't have the necessary file for unbinding. Unbinding i2c-MSHW0030:00 definitely does not have any effect on the interrupts.

However, unbinding INT33C2 via /sys/bus/i2c/devices/i2c-0 gets rid of the interrupt spam and allows my touchscreen to work! So congrats, if nothing else you've at least helped me find a workaround. :)

jkatzmewing avatar Jun 30 '21 22:06 jkatzmewing

Interesting, IIRC INT33C2 is an i2c controller. So this means that one controller constantly sends interrupts whereas the others work fine. It might be possible that a device connected to this specific controller causes the interrupts. If the controller has any client devices, they should be specified in the directory of the controller, e.g. something like /sys/bus/i2c/devices/i2c-0/i2c-INT33BE:00 for an INT33BE client.

You could try unbinding drivers for those client devices individually next (if there are any). Keep in mind though that you first have to re-bind the controller driver or reboot (that's probably easier). After boot the directory name might be different, so you might have to search for the HID again (this should be unique according to the SP3 ACPI).

qzed avatar Jun 30 '21 22:06 qzed

@qzed MSHW0028 and MSHW0030 were the only devices attached to that controller (both disappeared after it was unbound). The former couldn't be unbound, and the latter being unbound did nothing, so perhaps the controller itself is the issue? IDK why that particular controller and no other though.

jkatzmewing avatar Jun 30 '21 22:06 jkatzmewing

(both disappeared after it was unbound)

Yeah, that's the expected behavior. The client devices are essentially the children of the controller. So if that goes, the clients go as well.

The two devices are volume/power buttons and SAM. IIRC the volume/power button driver isn't actually an i2c driver, so makes sense that there's nothing to unbind (and that then shouldn't cause the issues, hopefully).

SAM is another thing though. That's the integrated EC. I think it might be possible that SAM tries to send something to the host or somehow messes with interrupts in other ways even when the HID-over-I2C driver normally attached to it has been unbound. If it's truly caused by the EC, I'm afraid that we very likely won't be able to fix it without a SAM-over-HID/SAM-gen4 driver.

The controller misbehaving might be another possibility, but I think that's less likely (although we probably can't be sure, no idea how to really test that). So I kinda think that SAM/the EC is at fault here.

qzed avatar Jun 30 '21 23:06 qzed