surface-pro-x icon indicating copy to clipboard operation
surface-pro-x copied to clipboard

IO slows down everything else

Open denysvitali opened this issue 1 year ago • 8 comments
trafficstars

The SPX, running the latest kernel Linux surface-pro-x 6.6.8-1, is extremely slow when a lot of IO is being performed.

The device becomes incredibly unresponsive in read or write intensive operations such as:

  1. Network download
  2. Package update / System update
  3. Copying files between media

After a small analysis I was able to determine the reason: the IO seems to be causing the whole system to hang. To reproduce the issue, one can start an SSD benchmark (e.g: via the gnome-disks application) and look at the CPU usage / IO usage.

When such a test is performed, the disk is quite efficient (140MB/s in read), but the whole system start lagging: image

During testing, glxgears hangs (or is extremely slow), the CPU is not under pressure and I/O is very high. This results in a very slow system (everything freezes).

The queue scheduler currently used is none:

$ cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline kyber bfq

Switching this makes things even worse.

Does anyone have any clue on what's going on here? I don't think IO should cause a whole system freeze

denysvitali avatar Mar 19 '24 04:03 denysvitali

Could it be a similar issue like with sc8280xp? We need to have arm64.nopauth as boot parameter to avoid performance issues. I actually watched a similar behaviour on the WDK without it. And @jhovold has said a bit about it in his X13s presentation. Complete cmdline parameters: pd_ignore_unused clk_ignore_unused arm64.nopauth efi=noruntime Ref, I just dd the internal nvme to a new one (via USB-C nvme enclosure), and it gives ~600MB/s transfer rate. And no stalling of other USB usage.

jglathe avatar Mar 19 '24 11:03 jglathe

So, the arm64.nopauth didn't really change anything on my side - but it was a good pointer (pun intended) because it led me to the OpenSUSE page mentioning that.

After adding arm64.nopauth iommu.passthrough=0 iommu.strict=0 I managed to partially solve the problem and get 1.1+ GB/s of transfer with the SSD: image

The SSD operations do not seem to block the whole system anymore. Thanks! I'll be curious to test if this affect the Wi-Fi too (#45) - but I don't have my super-fast Wi-Fi network right now, so I can't really test.

denysvitali avatar Mar 19 '24 15:03 denysvitali

Yeah, arm64.nopauth has nothing to do with performance and is only needed to work around a bug in the Lenovo firmware which prevents the X13s to boot.

The iommu parameters were also only used as a workaround until the underlying issue, which turned out to be a display driver bug, was fixed.

Please make sure to check out my wiki at:

https://github.com/jhovold/linux/wiki/X13s

for the current mainline status for the X13s (and sc8280xp). It should always be more up to date compared to the distribution wikis that use it as a source.

jhovold avatar Mar 19 '24 15:03 jhovold

So this looks like surface-pro-x kernel has similar issues than the X13s one had a while back. Maybe worth a look to compare patches.

jglathe avatar Mar 19 '24 19:03 jglathe

6.6 is kind of old by now, so hopefully things work better with 6.8. I tried updating the patches yesterday but a simple rebase broke the display... so I need a bit more time for that (hopefully I'll have that by next week though). And I'll also have a look at which of the sc8280xp/lenovo patches could be helpful.

qzed avatar Mar 19 '24 21:03 qzed

Is there any plan to upstream our patches, so that we can run linux mainline in the future?

denysvitali avatar Mar 19 '24 22:03 denysvitali

@denysvitali As soon as I find the time for it... I need to debug a (likely) uefisecapp crash first though... (that's already upstream but we think it seems to be a bit picky about using any memory for DMA with the trust-zone). And I think there's still a bit of clean-up required. But yeah, I hope that we can get all of this upstream eventually.

qzed avatar Mar 19 '24 23:03 qzed

I have some preliminary patches for v6.8 at https://github.com/linux-surface/kernel/tree/spx/v6.8 if any one wants to try those.

qzed avatar Mar 28 '24 02:03 qzed