dxwrapper icon indicating copy to clipboard operation
dxwrapper copied to clipboard

Fix Windows 10 awful performance with software vertex processing

Open mirh opened this issue 4 years ago • 6 comments

See https://github.com/Nucleoprotein/OneTweakNG/issues/1#issuecomment-568756645 https://docs.microsoft.com/en-us/windows/win32/direct3d9/d3dcreate https://social.msdn.microsoft.com/Forums/en-US/a84dce94-49f4-4118-9e68-fe412c909ee4/directx-9-program-runs-terribly-after-win10-update-1607

Perhaps switching to mixed and hardware one may be already some solution (idk I'm not a dev), but if the original game devs didn't think to it my uneducated guess would be there was a legit reason.

mirh avatar Jan 26 '21 01:01 mirh

Interesting. It looks like changing the vertex buffer to use system memory and switching the vertex processing to mixed mode solved the performance issues. Mixed mode processing is not normally recommended by Microsoft. But I suppose adding an option for this would be good.

elishacloud avatar Feb 28 '21 02:02 elishacloud

I mean, to be fair, the last link could as well be just amd-specific.. they have been skimping on the d3d9 driver as of the last years in turn. But nice for the option.

Assuming it actually ever was just for retrocompatibility, I wonder if it couldn't be safe to have all vertex processing be forced on hardware on say akk >DX10 or >SM3 gpus?

mirh avatar Feb 28 '21 22:02 mirh

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9451/diffs Not exactly linked to the problem here, but I guess you might found interesting these behind the scenes factlets.

mirh avatar Mar 14 '21 14:03 mirh

Mhh, so, I just got some testing (yes, sorry for the lazy ass) But ForceMixedVertexProcessing doesn't really help much, while ForceSystemMemVertexCache totally nukes framerate.

Maybe forcing hardware processing could do it?

mirh avatar May 25 '22 14:05 mirh

Forcing hardware processing should do it. But may also cause graphical issues is the hardware cannot support somethings. Though it should be less of an issue with d3d9, I think. There is already an option to ForceMixedVertexProcessing. I could create a new option to ForceHardwareVertexProcessing.

elishacloud avatar May 25 '22 16:05 elishacloud

Not really sure what could be the point of forcing modes if it isn't to improve performance (also was there any specific reason behind ForceSystemMemVertexCache?), but maybe you could have a single option with an integer value rather than a boolean.

EDIT: nevermind, over at dxwnd they report quite the success with the flags. It's just that they are just a drop in the bucket compared to all the other changes.

mirh avatar May 25 '22 17:05 mirh

Good god 1607 was just the beginning, "modern" W10 is an absolute trainwreck in cpu-limited scenarios (be it because the big ass game is actually legitimately heavy, or because you are running a VM on an ULV laptop and you are giving it some extra help by forcing less cores).

I set up a number of VMWare 16 vms with different Windows versions while hunting down another regression, and so I figured I could as well check this graphics mystery for the lulz (the virtual SVGA device isn't exactly comparable to native, but it does expose a legit d3d9 driver to the system). Since the first time I noticed something was amiss happened to be in Mass Effect on my i7-6500U+950M old bean, I also decided that to be my go to benchmark.

And all was well, all was good, compared to W7 I could find barely half as much fps right in the title screen. ... except, this was on 1703. Not 1607 as everybody else seemed to report above (and for once I couldn't just dismiss the people as "imprecise" since you had all kind of developers certain about that version, even precisly naming some slowed down functions). And so I scavenged for something that could show any difference in 1607, else leaving the issue I reported in the OP untested.

Welcome to the nvidia directx samples. After some tinkering to find the VS2003 redistributable, I was delighted that with many of them (both for the hardware and software vertex processing devices) I could indeed report a slight but constant ~15% performance impact between W7 and 1607. Thus I was almost calling it a day with the 1703 vm, but right before nuking it to crosscheck 1511 (which seemed to be good eventually btw) I decided to give it a go with the microbenchmarks. ... And then I noticed it. In 1703, I could find at least one case where reported framerate was easily A THIRD of where you'd expect it to be.

I don't really have any other actual technical insight on the matter, but with even the source code of the affected applications available I believe it should be pretty easy to (if not fix) at least pinpoint what's fucked up. EDIT: I also had some luck with older OGRE demos (in particular: crowd, dot3bump, grass, lighting and water) p.s. spectre and meltdown performance hit should be also to rule out because I have mitigations disabled on the host, and anything before 2018 should be unaffected

mirh avatar Nov 15 '22 00:11 mirh