intel-precise-touch
intel-precise-touch copied to clipboard
Avoid polling and other inefficiencies
I found that the userspace drivers are wasting a lot of CPU (~3%) while waiting, even when the touchscreen is completely idle. The only way userspace becomes knowledgeable of changes is by polling get_device_ready and get_doorbell every millisecond, and that is wasting power. Decreasing the polling frequency isn't a solution because that would only introduce latency.
There are many better alternatives to polling:
- Have
get_device_readyandget_doorbellblock until new data is ready - Support actual nonblocking IO such as
select,poll, andepoll - Have those be blocking character devices
Additionally, I'm not sure how efficient copying the buffers to userspace is. Performance might be improved if we allow userspace to mmap the 16 buffers instead of reading from them
That's a bit tricky. IIRC we can't do anything other than polling the hardware by checking the doorbell due to how it used to work before Intel restructured their firmware (@StollD can give you more details here, that's a part that I'm still not quite familiar with). Either in user-space (AFAIK the decision for that was because it's easier to play around with) or in the kernel driver (which might be better for performance and user-friendliness, as that would essentially allow emulation of polling interfaces).
I agree that there are usually many better alternatives than polling, but unfortunately if the hardware doesn't give you a clue (i.e. some sort of interrupt) that there's new data available, polling is the only option. Although that 3% seems a bit much, for me iptsd isn't noticeable when idle (read 0.0% in htop, it's ~3% when I touch something).
@StollD can probably tell you more about all the wonderful details of the IPTS/ME hardware.
Regarding mmaping: I think there always has to be one copy. Either DMA buffer to mmaped buffer or DMA buffer to read buffer. Don't think that makes much difference, maybe that could save a small bit of validation overhead for the kernel-to-user copy of the read buffer, but I wouldn't expect much.
That's a bit tricky. IIRC we can't do anything other than polling the hardware by checking the doorbell due to how it used to work before Intel restructured their firmware (@StollD can give you more details here, that's a part that I'm still not quite familiar with).
Yeah, pretty much. The doorbell is basically a leftover of how IPTS worked with GuC submission. The doorbell is just a u32, but incrementing it triggers an interrupt in the GuC firmware, so that it can schedule the new data to be processed.
Since IPTS only gives us the doorbell, all we can do is poll. Either in the kernel or outside of it. And because I'd like to keep the driver as simple as possible, I moved the responsibility for polling out of it.
iptsd tries to mitigate it a bit by using a high polling frequency only if it received data in the last 5 seconds. If no data comes in for longer than that it will lower the frequency and poll less.
I wanted to look into mmap in the past but havent found time for playing around with it yet.
iptsd tries to mitigate it a bit by using a high polling frequency only if it received data in the last 5 seconds
Oh, I didn't notice that as I only looked through ipts-dbg.
read 0.0% in htop, it's ~3% when I touch something
iptsd was 3% a few months ago. I just checked today in htop and it's down to 0% like you said.
I just implemented a backoff in my own driver and it's also down to 0% now. Unfortunately, it's not as optimized, so it still goes up to 12% when touched.
If no data comes in for longer than that it will lower the frequency and poll less
I guessing that excludes the type 3 size 64 messages we get every second?