vkd3d-proton
vkd3d-proton copied to clipboard
CPU optimization exploration tracker
The idea of this issue is to explore CPU optimizations in vkd3d-proton. For a game to be considered here it should be CPU bound with significant API overhead, i.e., we can meaningfully improve game performance through perf tuning on our end.
Information needed:
- Game name (AppID)
- How to reproduce CPU heavy scene -> minimum effort required to get to it from a fresh game. Screenshots are helpful.
Monster Hunter: Rise (1446780)
Details TBD
Monster Hunter: Rise saw large improvements recently with the descriptor copy optimizations (90 -> 100 fps) and there are even more gains with the descriptor punchthrough path (100 -> 110 fps). However, it is still cpu limited. The area that's most cpu limited it right after you start the first hunt when looking into the distance:
Screenshot
To get to it, start a new game and mash through the tutorial in the village until the quest-giver allows you to take the first hunt.
DEATH STRANDING is also a good candidate to look into, it's usually cpu-bound with descriptor copies high in perf top, especially in later areas, but even right after starting the game, especially at lower resolutions:
Screenshot
(Savegames for this game are not share-able and it takes forever to reach later, more cpu-bound, areas.)
Control (870780)
Almost everywhere. I can first see it in the first scene with janitor. But it's almost never go below 50 fps
Screenshots
What exactly is the issue with Control? CPU limit doesn't necessarily mean that our code runs into an obvious bottleneck and performance should generally be good in that game, assuming reasonable hardware and a non-borked wine configuration.
What exactly is the issue with Control? CPU limit doesn't necessarily mean that our code runs into an obvious bottleneck and performance should generally be good in that game, assuming reasonable hardware and a non-borked wine configuration.
I did some tests on windows and there is huge difference with vkd3d-proton in some scenes. On windows even with low configuration and render resolution 960x540 performance always was limited by gpu. On linux with same config 2 times lower fps and 45% gpu load
Windows dx12
Proton dx12
Proton dx11
Try VKD3D_CONFIG=no_upload_hvv
maybe. Differences that huge are normally not caused by optimization issues.
Also, please mention your hardware when complaining about performance...
VKD3D_CONFIG=no_upload_hvv
has no visible effect on performance. My GPU (RX 590) have 8G VRAM.
I see a direct correlation between fps and CPU frequency in this scene.
With maximum frequency (3.3) - 79 fps (143 fps on windows with same max freq but default governor)
With frequency fixed at 3.0 - 71 fps
With frequency fixed at 2.5 - ~50 fps
And just for comparison dx11 version with frequency fixed at 1.2 - 120 fps with 100% GPU load (dx12 - 25 fps )
Tests were conducted with performance governor:
cpupower frequency-set --governor performance
cpupower frequency-set -f <freq>
Setting governor to default(schedutil) leads to low unstable fps from 40 to 53
It's definitely CPU bound and doesn't appear on windows or with dx11 version with proton. I can check performance on Windows with limited CPU frequency if it helps.
UPDATE: On windows minimum render resolution available 720p. With balanced power settings (max freq 3.3) - 143 fps With CPU frequency fixed at 1.2 - 65 fps
I have similar issues to @kermeat with Control; DXVK appears to give much better performance than VKD3D. Certain areas of the map seem to be CPU bound with VKD3D, dropping GPU usage down to 40 - 50% (FPS drops to 40-50 accordingly). I don't get this with DXVK.
Both the below screenshots are captured using GE-Proton7-37
, with exactly the same graphical options set, at native1440p.
Launch options for VKD3D: PROTON_ENABLE_NVAPI=1 VKD3D_CONFIG=dxr11 mangohud %command% -skipStartScreen -dx12
Launch options for DXVK: PROTON_ENABLE_NVAPI=1 mangohud %command% -skipStartScreen -dx11
Spider-Man: Remastered (1817070)
Spider-Man: Miles Morales (1817190)
Those two seem to be by far the most CPU heavy games with vkd3d-proton, at least on Nvidia GPUs.
Test case
Sitting on a lantern in the middle of Times Square in Miles Morales. Settings: Maxed out, including ray tracing. With the exception that the RT distance is kept at the middle setting which is the default. Resolution doesn't matter as it's CPU limited in all cases.
Results
- 37 fps with VK_EXT_mutable_descriptor_type
- 37 fps with VK_EXT_descriptor_buffers but vkGetDescriptorEXT has to go through the Wine syscall path
- 47 fps with VK_EXT_descriptor_buffers but that function uses a direct call
- 42 fps with vkd3d-proton master on Windows (with VK_EXT_mutable_descriptor_type)
- 61 fps on Windows
Unfortunately Windows is 1.3x as fast as the fastest result I got on Linux.
VKD3D profiling result: milesprofiling.txt
As text, sorted by ticks: milesprofiling.txt