dasharo-issues icon indicating copy to clipboard operation
dasharo-issues copied to clipboard

Echo 11 Thunderbolt 4 ECHO-DK11-T4 heavily impacting computer's performance

Open wiktormowinski opened this issue 1 year ago • 37 comments

Component

other

Device

NovaCustom V54 14th Gen

Dasharo version

v0.9.0

Dasharo Tools Suite version

Test case ID

Brief summary

There is a problem to reproduce client's evironment to get the same result as them

How reproducible

not sure

How to reproduce

that's the issue- we don't know

Expected behavior

The platform's performance is eavily impacted by dock

Actual behavior

Everything works smoothly.

Screenshots

No response

Additional context

The client complained about the performance of their computer while using Echo 11 Thunderbolt 4 ECHO-DK11-T4. However, while running stress tests to reproduce such behaviour, the machines CPU clock and power remained relatively the same regardless if the dock was plugged or not. Therefore further investigation of this issue needed.

Solutions you've tried

current approach aimed at measuring CPU's performance, perhaps the problem lies somewhere else entirely

wiktormowinski avatar Aug 07 '24 11:08 wiktormowinski

I recall some guy on chat that also had performance issues when plugging a TB4 dock (Not the same one, but maybe they share a reference design/chips). IGP performance in a graphics benchmark halved by just having the dock plugged in.

zirblazer avatar Aug 07 '24 21:08 zirblazer

I recall some guy on chat that also had performance issues when plugging a TB4 dock (Not the same one, but maybe they share a reference design/chips). IGP performance in a graphics benchmark halved by just having the dock plugged in.

It was I who reported this issue on Matrix. Here's what I know so far:

On my own TB4 docking station (Icy Box IB-DK8801-TB4 Thunderbolt™ 4 Dock with PD 100 W), having the notebook connected and powered results in a performance drop of roughly 50% compared to the barrel-jack power connector. The Icy Box IB-DK8801-TB4 is a rebrand of the Goodway DBD1330 Thunderbolt™ 4 / USB4 Dock Pro, which itself is based on Goshen Ridge Thunderbolt Controller afaik.

I believe the issue in question is related to power delivery. When having both the dock as well as the barrel-jack power plug connected, the notebooks performance is as expected.

PerAstraAdDeum avatar Aug 19 '24 13:08 PerAstraAdDeum

Results of mentioned tests are included here: https://pad.3mdeb.com/sheet/#/2/sheet/view/3qtlwsbhPvu3fo1ZgaTfwMUC2ia0eWj2nY21+lKTky8/

wiktormowinski avatar Sep 10 '24 10:09 wiktormowinski

Results of mentioned tests are included here: https://pad.3mdeb.com/sheet/#/2/sheet/view/3qtlwsbhPvu3fo1ZgaTfwMUC2ia0eWj2nY21+lKTky8/

Note that you are including CPU results only. The benchmarks that were catastrophic (50% difference) were GPU related. You most likely want @PerAstraAdDeum to post his results here and what was his test environment.

zirblazer avatar Sep 10 '24 10:09 zirblazer

valid point, here are gpu logs: gpu-dock-stress-test.log gpu-stress-test.log

wiktormowinski avatar Sep 10 '24 11:09 wiktormowinski

Well, those do seem within error margin. So I suppose we need more info from the affected users setups.

zirblazer avatar Sep 10 '24 11:09 zirblazer

yeah :c

wiktormowinski avatar Sep 10 '24 11:09 wiktormowinski

Results of mentioned tests are included here: https://pad.3mdeb.com/sheet/#/2/sheet/view/3qtlwsbhPvu3fo1ZgaTfwMUC2ia0eWj2nY21+lKTky8/

Note that you are including CPU results only. The benchmarks that were catastrophic (50% difference) were GPU related. You most likely want @PerAstraAdDeum to post his results here and what was his test environment.

Sure thing, here I am. I've done cross-testing with various drivers and kernel versions, these were my findings:

Screenshot_20240910_142839

Benchmarks were done with Unigine Superposition. I've experienced similar performance impacts with various games, always around 50% slower with docking station than without.

PerAstraAdDeum avatar Sep 10 '24 12:09 PerAstraAdDeum

After two more months I did another benchmark. Here's the result (last line):

Screenshot_20241109_131156

As you can see, the error is persistent. There's also a performance regress for the iGPU of about ten percent, that also occurs when powered by the barrel jack (4820 -> 4423, that's -8,24%). Don't know the cause of that yet.

On the bright side, it seems like only the iGPU is affected. Benchmarks with my eGPU are consistent between barrel-jack-powered and TB4-powered.

PerAstraAdDeum avatar Nov 09 '24 12:11 PerAstraAdDeum

Screenshot_20241125_132955

Today I've tried the xe driver once more, with the release of 6.12 Kernel it is rumored to be a lot more stable and reliable. So I've switched the drivers and ran some benchmarks, and, to my utter bewilderment, the performance gap between TB4 charging and barrel charging is gone!

I have absolutely no idea how this has happened, given the fact that this issue has been confirmed to be firmware-related in the Matrix chat room and there hasn't been any new firmware release since then. Anyway, everyone affected by this issue should give the xe driver a try and see if this solves the issue!

PerAstraAdDeum avatar Nov 25 '24 12:11 PerAstraAdDeum

Well, seems like we're back to where we started:

image

Now this is a benchmark done just a few minutes ago.

Coincidentally, I've been helping a dev testing some new patches for Mangohud, and I marveled at the performance of the Intel Arc iGPU:

image

When I start the game now, my framerate is considerably lower:

image

As you can see by the time shown in the HUD, there's only about 90 minutes between these runs. So whatever did impact the performance must have happened in these 90 minutes. And indeed, something happened: the TB4 connection was interrupted, and I had to re-plug the cable. I'm pretty sure now that this is what did halve the performance. Sadly, it prevailed a reboot. I'll try a cold boot next and see if the issue prevails.

PerAstraAdDeum avatar Dec 03 '24 15:12 PerAstraAdDeum

A cold boot didn't solve the issue.

I was however able to trick the notebook like this:

  1. Plug both TB4 as well as barrel-jack charger into the notebook.
  2. Run benchmark/graphics-demanding application.
  3. Pull barrel-jack charger.

Surprisingly I'm now back to "full-power-mode", if you will:

image

This is obviously just a hack and no viable solution.

PerAstraAdDeum avatar Dec 03 '24 16:12 PerAstraAdDeum

@mkopec isn't that because EC just cuts the power by cropping PL4 to the max USB-PD power? When barrel jack is plugged, then PL4 is higher and allows to reach >4000 score. Just a hunch

miczyg1 avatar Dec 03 '24 17:12 miczyg1

@mkopec isn't that because EC just cuts the power by cropping PL4 to the max USB-PD power? When barrel jack is plugged, then PL4 is higher and allows to reach >4000 score. Just a hunch

Is there any way of testing this?

PerAstraAdDeum avatar Jan 14 '25 12:01 PerAstraAdDeum

@mkopec isn't that because EC just cuts the power by cropping PL4 to the max USB-PD power? When barrel jack is plugged, then PL4 is higher and allows to reach >4000 score. Just a hunch

That's what I thought too.

Is there any way of testing this?

Check Limit reasons in ThrottleStop (on Windows).

The dock itself can deliver 90W which is as good as the barrel jack charger. Question is, does it actually deliver that much? That can depend on the USB-C cable. A cable may be rated for 60W, 100W or 240W, so if the cable was changed at some point, that could be the culprit. @PerAstraAdDeum Can you check the maximum power draw in throttlestop while on dock alone and while on barrel jack charger?

If it's not the cable, then I suspect the EC is having trouble determining the USB-PD power limit

mkopec avatar Jan 14 '25 13:01 mkopec

Hey, thanks for the swift reply!

Sadly I'm not running Windows. Is there a way to read those values in Linux? (Arch to be precise)

Also, about the cable. A few months ago I've changed the dock's TB4 cable for this one here: Cable Matters [Intel Certified] 40Gbps Braided Active Thunderbolt 4 Cable 2 m with 100W Charging Power.

The issue is the same with both cables though.

PerAstraAdDeum avatar Jan 14 '25 13:01 PerAstraAdDeum

@mkopec , I gave this some more thought and I'm sure it's neither the cable nor the docking station that is at fault here. As previously stated both are very well able to deliver 90 Watts (see the benchmark results). This is only possible if the notebook has been "tricked"; e.g. by plugging in the barrel jack charger, running a benchmark and pulling the barrel jack charger again.

I believe the EC having trouble determining the USB-PD power limit is the most likely culprit.

PerAstraAdDeum avatar Jan 14 '25 14:01 PerAstraAdDeum

Might have been fixed by https://github.com/Dasharo/coreboot/pull/612 , need to retest on the latest code

mkopec avatar Mar 13 '25 11:03 mkopec

Might have been fixed by Dasharo/coreboot#612 , need to retest on the latest code

That would indeed be great! I'd be happy to test a beta release if there's one.

PerAstraAdDeum avatar Mar 13 '25 19:03 PerAstraAdDeum

Also, while talking about voltage:

Image

The V54's iGPU is constantly throttling, apparently because of reaching the voltage limit. Sadly I don't know what voltage the iGPU is currently drawing, even though that should be implemented in MangoHud already:

Image

I assume this is because of missing https://github.com/Dasharo/dasharo-issues/issues/820 ?

PerAstraAdDeum avatar Mar 13 '25 19:03 PerAstraAdDeum

When gathering reference performance values for this issue, I had trouble to properly benchmark with the Echo 11 dock. This is somewhat related to https://github.com/Dasharo/dasharo-issues/issues/1081, but manifests across OSes in exactly opposite conditions.

Initial benchmarking of v54tu without and with dock were similar but inconclusive, however connecting the USB-C to HDMI display makes the benchmark barely respond, often crashing whole system (Ubuntu 24.10 with 6.11.0-19 generic kernel). This seems suspicious in context of what happens on the Windows system: where USB ports and Ethernet connection of the dock randomly drops unless the USB-C to HDMI Display is connected.

I will recreate the setup again, and try gather dmesg logs if the machine is responsive at all.

SebastianCzapla avatar Mar 19 '25 11:03 SebastianCzapla

Voltage / current throttling should be fixed as of https://github.com/Dasharo/coreboot/pull/612

Sadly I don't know what voltage the iGPU is currently drawing

You can check detailed voltages and throttle reasons in HWInfo64

mkopec avatar Mar 19 '25 15:03 mkopec

Voltage / current throttling should be fixed as of Dasharo/coreboot#612

Sadly I don't know what voltage the iGPU is currently drawing

You can check detailed voltages and throttle reasons in HWInfo64

Happy to hear! I don't know exactly what the linked issue fixes though? Also, I'm not on Windows, so HWInfo64 isn't available. Is there any way of checking the voltage in Linux?

PerAstraAdDeum avatar Mar 19 '25 15:03 PerAstraAdDeum

I don't know exactly what the linked issue fixes though?

That PR was initially meant to fix some other throttling issues on -TU models, in that case I narrowed it down to current / voltage throttling. If this is the same throttling then the issue might be fixed in the next release for this platform.

Is there any way of checking the voltage in Linux?

I'm not aware of it sadly

mkopec avatar Mar 19 '25 16:03 mkopec

dmesg-unigine-superposition-at-817.log

Here is a dmesg while testing this issue with a Echo 11 Dock. The issue at ~817s in, happens when even slightest movement happens to the window of benchmark. This does not happen at all without the dock attached.

[  817.723135] workqueue: delayed_fput hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[  831.332198] workqueue: delayed_fput hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
[  844.034255] workqueue: delayed_fput hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND

Disconnecting dock makes the issue go away immediately. Could this be related to lost performance? It definitely is a weird issue only specific to this dock.

SebastianCzapla avatar Mar 25 '25 13:03 SebastianCzapla

Tested with INSYDE FIRMWARE on V540TU & V560TNE;

Windows 11: no reproduction Ubuntu 24:

  • V540TU: issue exists, but no specific traces in dmesg
  • V560TNE: issue exists:
[ 1338.764924] workqueue: delayed_fput hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
[ 1340.320926] workqueue: delayed_fput hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND

v540tu _dmesg.txt

v560tne_dmesg_hog.txt

matmacieje avatar Apr 07 '25 10:04 matmacieje

Tested with Dasharo 1.0.0-rc1 on V540TU & V560TNE;

Windows 11: no reproduction Ubuntu 24:

  • V540TU: issue exists, but no hog traces in dmesg
  • V560TNE: issue exists:
[  101.329433] pcieport 0000:00:06.0: AER: Correctable error message received from 0000:02:00.0
[  101.329450] nvme 0000:02:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[  101.329453] nvme 0000:02:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
[  101.329457] nvme 0000:02:00.0:    [ 0] RxErr                  (First)
[  122.966192] workqueue: pm_runtime_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
[  138.182996] pcieport 0000:00:06.0: AER: Correctable error message received from 0000:02:00.0
[  138.183011] nvme 0000:02:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[  138.183014] nvme 0000:02:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
[  138.183017] nvme 0000:02:00.0:    [ 0] RxErr                  (First)

vt560tne_dmesg_dasharo_issue.txt

v540tu_dmesg_dasharo.txt

matmacieje avatar Apr 07 '25 16:04 matmacieje

hogged CPU reproduction on V560TNE:

Image

matmacieje avatar Apr 08 '25 10:04 matmacieje

Execution results on V560TNE, Dasharo 1.0.0-rc1:

Image

NOTE: incorrect GPU name

matmacieje avatar Apr 08 '25 10:04 matmacieje

hogged CPU reproduction on V540TU:

Image

NOTE: unigine produces black screenshots on V540TU, on both FWs and with no Sonnect dock connected.

matmacieje avatar Apr 08 '25 11:04 matmacieje