dxvk
dxvk copied to clipboard
[Feature request] Better performance on eGPUs
System information
Asus TUF F15 FX516PM
- GPU: ASRock AMD Radeon RX 7800 XT 16GB
- Driver: Mesa RADV 23.3-oibaf PPA
- Wine version: 8.16
- DXVK version: 2.3
- eGPU: Razer Core X
- Distribution and kernels: Ubuntu 23.04 23.10, Arch Linux, Linux-Next kernel
Vulkan Drivers tested
- AMDVLK
- Mesa RADV
- AMDGPU PRO
Workarounds tested
RADV_PERFTEST=dmashaders RADV_PERFTEST=nosam DXVK_ASYNC=1 (no changes)
Summary
Its pretty much known for a fact eGPUs on Linux are a total nightmare, especially with newer cards, looks like it was not noticeable enough on older ones like the RX 550 I also tested with an NVIDIA card (RTX 3060) and the issues are pretty much the same.
The most noticeable thing about this, other than the really bad fps and low GPU usage, is the single core CPU usage spiking at 100% (not always the same core) when DXVK is used.
However, I'm sure DXVK on Windows with the same setup doesn't seem to affect performance that much in such games.
Slightly OT, there is something fishy also on vulkan only games as well as they seem to have performance issues even when DXVK is not used (ex. Baldurs Gate 3 on VK, performs decently on Windows even with the eGPU, or the integrated discrete NVIDIA even on Linux ) so I am tracking this issue even on mesa and the kernel driver where it has to be confirmed there may be some problems regarding the actual pcie speed detection, at least for AMD cards However, its not always the case since some games perform good as Windows, taking for example DOOM Eternal, which works very very nicely.
I must also say making the eGPU running at apparently full PCIe 3.0 4x speed, a kernel parameter must be added options amdgpu pcie_gen_cap=0x40000
Forcing the limit to 1.0 x4 speeds doesn't seem to change that much in my case.
In the following
Comparison of Cyberpunk 2077 between forcing PCIe 1.0 4x speeds vs. 3.0 4x
https://gitlab.freedesktop.org/mesa/mesa/uploads/a431d19be845fd639f7a5989578e2aa0/immagine.png
Using an OpenCL Benchmarking tool, I have apparently confirmed at least with compute workloads, the PCIe bandwidth is correct and changes accordingly to that kernel parameter.
Speaking back to Linux, the issue is very much present in the case even on older games like Half-Life 2, Far Cry 5, Killing Floor 2 No matter the distribution, kernel, mesa drivers, dxvk version, etc...
The only thing that helped was using WINED3D instead of DXVK when OpenGL was used. WineD3D Vulkan backend has the same impact.
Software information
Half Life 2: No more than 130-140~ fps, even on low settings, low resolution, without frame limiter Killing Floor 2: 35 fps when things are busy on screen, even on low resolution, low settings Far Cry 5: 40 fps, even on low resolution and low settings. Cyberpunk 2077: builtin benchmark does not' show more than 39-40 fps, no matter the video settings.
Ramping up the setting on such games, do not change the results at all. CPU usage was also overall higher even on Vulkan Only games.
Apitrace file(s)
Just in case I made one for HL2, but doesn't make sense to post it.
Why open this issue on the dxvk tracker? You wrote a whole lengthy post yourself to point out that this issue isn't exclusive to dxvk and doesn't happen with dxvk on windows drivers.
Why open this issue on the dxvk tracker? You wrote a whole lengthy post yourself to point out that this issue isn't exclusive to dxvk and doesn't happen with dxvk on windows drivers.
Because
- already discussed about this on discord 🐸
- all dxvk games are affected, I pointed out just one exception
If windows dxvk isn't affected then this seems something about linux in general
The only thing that helped was using WINED3D instead of DXVK when OpenGL was used. WineD3D Vulkan backend has the same impact.
Likely something with the vulkan drivers/presentation
It was discussed a bit on Discord* and there are some things dxvk probably could do better in in regards to eGPUs (afaiu the devs. Correct me if I'm wrong). So the issue should be fine to track it.
Edit: meant Discord and not GitHub
If windows dxvk isn't affected then this seems something about linux in general
The only thing that helped was using WINED3D instead of DXVK when OpenGL was used. WineD3D Vulkan backend has the same impact.
Likely something with the vulkan drivers/presentation
I took an entire month of testing and it's still strange because it happens on 99% of times when dxvk is involved, the performance drop in Windows was just not as hard as it was on Linux, considering the NVIDIA card had less benefits overall with Vulkan. On discord we were talking about how much DXVK had an impact for PCIe bandwidth
Also, I may be CPU bottlenecked when the eGPU was used since some games (BG3 Vulkan for example and being one exception, still only on linux) were on the edge, I don't play new games overall so I was not able to create a decent series of tests. VKD3D was affected as well anyway.
Sorry I misclicked again the close with comment, I'm typing and reading from my phone in the most uncomfortable way ever🤦♂️
I had a bit of free time and decided to really get into figuring out the issue. There seems to be a lot of different opinions on the issue especially on discord that's why I'll just show the tests I made.
As I said, I managed to get an RTX 3060 and the same exact behavior is shown, same games show the same performance issue, with the same exceptions, being DOOM Eternal working surprisingly well, overall DXVK/VKD3D games are much more affected rather than native Vulkan ones, even if BG3 shows much worse performance on both DXVK and Vulkan, still only on Linux.
Out of curiosity, I also tried a desktop with the AMD card and even forced PCIe 1 x4 speeds (and re-forced by the kernel module option) and the performance is infinitely better than anything tried since now. They are playable compared on Linux with the TB3 eGPU
For comparison, the RX 7800 XT on the desktop: Cyberpunk 2077:
103 fps~ Ultra settings, PCIe 3 x16 80 fps~ Ultra Settings, PCIe 1 x4 110 fps~ Low Settings, PCIe 1 x4
Also whats really strange is compared to the eGPU, on the desktop with limited pci bandwidth changing video settings did actually affect the performance by a lot.
Also tried disabling/enabling Resizable BAR and above 4G decoding on the bios, and didn't change that much Laptop Cyberpunk 2077: (Already posted the results)
24 fps~ Ultra settings, PCIe 3 x4 24 fps~ Ultra Settings, PCIe 1 x4 30 fps~ Low Settings, PCIe 1 x4
ReBar not avaiable on the laptop BIOS options as enabling or disabling above 4G decoding
The only noticeable difference was an higher CPU usage in both when lower PCIe speeds were set.
My laptop has an i7 11375h while the desktop has a much better CPU (Ryzen 5 5600) but still, even windows with dxvk runs much better.
Either the BIOS really doesn't downgrade the PCIe speed, even if I confirmed it by looking at lspci
How could it be? I'm speechless at this point, I don't even know how to verify the bottleneck here, maybe there really is something wrong with the thunderbolt driver itself rather than the everything else already thought.
Maybe you can try downgrading the PCI speeds and share the results on a weaker CPU?
Just to confirm, when using the eGPU, is the monitor plugged directly into the eGPU? Also, when on Linux, what DE and are you using Xorg or Wayland?
I'm not sure if that's specific for SnowRunner, but enabling/disabling d3d11.dcSingleUseMode
has a huge effect on my eGPU setup (TB3 + Sonnet Breakaway Box 650 + RX6700XT on Framework 13 13th Gen Intel, Ubuntu 23.10, GNOME Wayland running on iGPU, game is running on external monitor plugged into the eGPU):
d3d11.dcSingleUseMode=False
(was set to that in dxvk.conf by itself for some reason) on Low settings is around 20 fps with 40% GPU load:
Screenshot
d3d11.dcSingleUseMode=False
on Ultra settings is around 15 fps with 50% GPU load:
Screenshot
d3d11.dcSingleUseMode=True
on Low settings is around 45 fps with 70% GPU load:
Screenshot
d3d11.dcSingleUseMode=True
on Ultra settings is around 38 fps with 70% GPU load:
Screenshot
I'm not sure what this setting changes exactly, but it has an observable effect on both GPU load and game FPS.
That dcSingleUseMode=False is slow is expected, but it has nothing to do with this issue. It is set to false by default only for 5 games in total, so those probably have been found to rely on it at some point and will break without it.
I have noticed that the problem occurs mostly when the game uses an excessive number of draw calls (more than 4000-5000 draw calls per second).
The worst case I have seen is on Civilization VI. During the graphic benchmark, peaks of 14000 draw calls per second (!) are reached. With an Nvidia RTX 2060 Super eGPU I barely reach 15 fps using DXVK (even on Windows) compared to 70 fps with native DX11.
Can't DXVK be optimized to cope with these situations of large numbers of draw calls saturating the limited eGPU bandwidth?
Thank you.
Something tells me that on Windows, the GPU driver knows that an eGPU is being used, and optimizes certain things for this case.
I don't have time (and don't want to install Windows on my hardware) to test this, but I think what can be done to verify this is:
- Run game on Windows running on bare metal with eGPU connected with Thunderbolt
- Run game on Windows running on an KVM with eGPU passed through. This will still be the same driver, but now it will not know whether it's a eGPU or not
- Run the same game with DXVK on Linux with eGPU.
In my tests, 2 and 3 are pretty much the same, and I did not measure 1. Something tells me that's the case though.
If that's the case, I don't think much can be done on DXVK's side, it should be Mesa's problem.
I want to chime in as this has been bothering me for a few years now.
I use a Razer Core X Chroma eGPU enclosure. I've gone through multiple GPUs - GTX 1080Ti, RX 580, RTX 3090, and currently - RTX 4090.
I've always had issues under Linux. The kernel version didn't matter, the GPU driver version didn't matter. The performance is just worse. Except for a few games or some lightweight games that don't utilize a lot of performance.
The problem might be coming from the Thunderbolt stack in the kernel and not from DXVK, as discussed here: https://gitlab.freedesktop.org/drm/amd/-/issues/2885#note_2295154
Relevant Bugzilla thread: https://bugzilla.kernel.org/show_bug.cgi?id=218525
Even if it isn't related, here's my data if it's of any use. I currently have dedicated around 700GB for a Windows partition so here's my tests:
- Baldur's Gate 3
-
- Windows: 120+ fps on max settings, GPU power draw is close to 200W, sometimes more; DXVK/Vulkan
-
- Linux: 15-20 fps on low/mid/high/max settings, GPU power draw is around 80-90W at most; DXVK/Vulkan
- Last Epoch
-
- Windows: 100+ fps on max settings
-
- Linux: 20-ish fps on medium/high settings; Native OpenGL/DXVK
- Guild Wars 2
-
- Windows: 30-100fps varying by zone on high/ultra settings; GPU power draw around 100-150W
-
- Linux: 12-15 fps medium/high/ultra settings; GPU power draw around 80W; DXVK
- World of Warcraft: Dragonflight
-
- Windows: 30-150fps varying by zone on max settings
-
- Linux: 30-40 fps on max settings; however after toggling the in-game setting for Triple Buffering even if vsync is off, the game window refreshes and starts running at the same-ish fps as on Windows; DXVK
- Final Fantasy XIV
-
- Windows: 80-120 fps varying by zone on max settings; GPU power draw at 120+ W
-
- Linux: 40-80 fps with a lot of fps drops and fps caps at 40 in some zones, max settings; GPU power draw unknown; DXVK
- Phasmophobia
-
- Windows: 100+ fps
-
- Linux: 10-15 fps; DXVK
One thing that ran close to Windows' performance while running on OpenGL natively on Linux was Unigine Heaven benchmark. The results are posted in the Bugzilla thread linked above.
Tested with Windows 10, Windows 11, Gentoo, Debian 12, Fedora 39, OpenSUSE Tumbleweed, Nobara 39, ArchLinux. The same results were observed on all Linux distributions. Kernel versions varied between 5.15 and 6.7.9. Nvidia driver versions varied between 515 and 550.
I use Lutris and Steam to play games, with the exception of FFXIV which now has a native Linux launcher obtained either via Flatpak or native compilation.