dxvk icon indicating copy to clipboard operation
dxvk copied to clipboard

[d3d9] Performance regression in Dawn of War DE with Ubershader

Open Blisto91 opened this issue 2 months ago • 2 comments

Dawn of War DE have quite a huge performance regression since the enablement of the Ubershader. See screenshots below for a comparison without and with the feature disabled.

Screenshots

No config Image

With DXVK_CONFIG="d3d9.ffUbershaderVS = False; d3d9.ffUbershaderFS = False" Image

Disabling just VS on its own improves it from 79fps ish -> 145fps ish in that scene. While FS on its own is 79fps ish -> 108fps ish.

Software information

Warhammer 40,000: Dawn of War - Definitive Edition Game version 2.4.0 and highest settings at 1080p

System information

  • GPU: RX 7900 XTX
  • Driver: Day old mesa-git (9ebda88e)
  • Wine version: Proton Bleeding Edge
  • DXVK version: Master

Apitrace file(s)

https://drive.proton.me/urls/VE0XFXQSG8#zvxxrP8LUPFJ

Blisto91 avatar Oct 22 '25 20:10 Blisto91

Just gonna note here in the same issue that i also found that the game previously had quite a big performance drop with commit https://github.com/doitsujin/dxvk/commit/60b6e98529b2dba0de600e5ec5cdefa9c597aac7 included in the 2.3 release. This is tested with the same scene as above.

Blisto91 avatar Oct 23 '25 07:10 Blisto91

There's two problems here:

Depth prepass hits the feedback loop detection.

The game binds the depth buffer as texture 0 for the depth prepass. That depth prepass is done using a D3DFMT_NULL texture, so color values are irrelevant. We don't currently remove the fragment shader for such passes but we also cannot do that here because the game also happens to enable alpha testing for this pass. It also uses a complicated fixed function configuration with 7 active texture stages. I assume it's the same configuration as when it does a regular color draw. In practice, the value of texture 0 is not used because it does the following for alpha:

Texture stage 0: Load texture value into CURRENT.

Texture stage 1: SELECTARG2 with Arg2 being CURRENT, store in CURRENT.

Texture stage 2: CONSTANT, store in TEMP.

Texture stage 3: SELECTARG2 with Arg2 being TEMP, store in CURRENT. So now the loaded texture value has been completely replaced by the constant.

I think if we also look at color values (rgb), the game actually uses the texture value but this is a depth-only pass after all.

Our feedback loop detection code isn't smart enough to figure this out, so we flag that first pass as a feedback loop and insert a barrier after every draw. This makes us end up with like 220 barriers for the depth prepass.

It uses the most complex fixed function setups I've seen so far.

The game uses fixed function configurations with more than 4 active texture stages for seemingly the majority of the screen. It even uses configurations with all 8 texture stages activated. We only optimize draws with up to 4 active texture stages because the assumption was that games draw almost all of their geometry using configurations like that. That assumption also held so far, in all games, I've looked at, 95-100% of draws were optimized.

K0bin avatar Oct 23 '25 12:10 K0bin