surface-pro-x
surface-pro-x copied to clipboard
IO slows down everything else
The SPX, running the latest kernel Linux surface-pro-x 6.6.8-1, is extremely slow when a lot of IO is being performed.
The device becomes incredibly unresponsive in read or write intensive operations such as:
- Network download
- Package update / System update
- Copying files between media
After a small analysis I was able to determine the reason: the IO seems to be causing the whole system to hang.
To reproduce the issue, one can start an SSD benchmark (e.g: via the gnome-disks application) and look at the CPU usage / IO usage.
When such a test is performed, the disk is quite efficient (140MB/s in read), but the whole system start lagging:
During testing, glxgears hangs (or is extremely slow), the CPU is not under pressure and I/O is very high.
This results in a very slow system (everything freezes).
The queue scheduler currently used is none:
$ cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline kyber bfq
Switching this makes things even worse.
Does anyone have any clue on what's going on here? I don't think IO should cause a whole system freeze
Could it be a similar issue like with sc8280xp? We need to have arm64.nopauth as boot parameter to avoid performance issues. I actually watched a similar behaviour on the WDK without it. And @jhovold has said a bit about it in his X13s presentation. Complete cmdline parameters: pd_ignore_unused clk_ignore_unused arm64.nopauth efi=noruntime
Ref, I just dd the internal nvme to a new one (via USB-C nvme enclosure), and it gives ~600MB/s transfer rate. And no stalling of other USB usage.
So, the arm64.nopauth didn't really change anything on my side - but it was a good pointer (pun intended) because it led me to the OpenSUSE page mentioning that.
After adding arm64.nopauth iommu.passthrough=0 iommu.strict=0 I managed to partially solve the problem and get 1.1+ GB/s of transfer with the SSD:
The SSD operations do not seem to block the whole system anymore. Thanks! I'll be curious to test if this affect the Wi-Fi too (#45) - but I don't have my super-fast Wi-Fi network right now, so I can't really test.
Yeah, arm64.nopauth has nothing to do with performance and is only needed to work around a bug in the Lenovo firmware which prevents the X13s to boot.
The iommu parameters were also only used as a workaround until the underlying issue, which turned out to be a display driver bug, was fixed.
Please make sure to check out my wiki at:
https://github.com/jhovold/linux/wiki/X13s
for the current mainline status for the X13s (and sc8280xp). It should always be more up to date compared to the distribution wikis that use it as a source.
So this looks like surface-pro-x kernel has similar issues than the X13s one had a while back. Maybe worth a look to compare patches.
6.6 is kind of old by now, so hopefully things work better with 6.8. I tried updating the patches yesterday but a simple rebase broke the display... so I need a bit more time for that (hopefully I'll have that by next week though). And I'll also have a look at which of the sc8280xp/lenovo patches could be helpful.
Is there any plan to upstream our patches, so that we can run linux mainline in the future?
@denysvitali As soon as I find the time for it... I need to debug a (likely) uefisecapp crash first though... (that's already upstream but we think it seems to be a bit picky about using any memory for DMA with the trust-zone). And I think there's still a bit of clean-up required. But yeah, I hope that we can get all of this upstream eventually.
I have some preliminary patches for v6.8 at https://github.com/linux-surface/kernel/tree/spx/v6.8 if any one wants to try those.