xemu icon indicating copy to clipboard operation
xemu copied to clipboard

Z24S8 surface upload/download slow on mesa radeonsi

Open CallumDev opened this issue 3 years ago • 10 comments

Profiling Azurik: Rise of Perathia under Linux with an AMD GPU shows a very large amount of time spent in the function _mesa_texstore_s8_z24, called when a Z24S8 surface is updated with glTexImage2D/glTexSubImage2D.

AMD GPUs don't support Z24S8 formats directly in HW so the OpenGL driver goes mad on converting them, this is causing slowdowns to 1-2 fps on my Ryzen 7 3700U.

Possible solutions:

  1. Be more aggressive in trying to skip Depth/Stencil download and uploads.
  2. Use some form of shader to convert Z24S8 to a hw-supported format instead of letting the gl driver convert the format on CPU

OR

  1. Copy a different format to RAM and hope for the best (probably won't work)

CallumDev avatar May 12 '21 12:05 CallumDev

Other games affected

  • Baldur's Gate: Dark Alliance II
  • Legacy of Kain: Defiance

Triticum0 avatar Jun 30 '21 21:06 Triticum0

Notes on the possible shader solution:

CPU->GPU, upload as 32-bit uint texture then render to FBO with depthstencil attachment.

Writing stencil values: https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_shader_stencil_export.txt This extension is supported on mesa. Would need to add codepath to disable this shader if not supported. Depth values write to gl_FragDepth

GPU->CPU, need to render to 32-bit uint texture then download.

Use depth texture + stencil view tex (seems to require OpenGL 4.4): https://stackoverflow.com/questions/27535727/opengl-create-a-depth-stencil-texture-for-reading

CallumDev avatar Sep 19 '21 16:09 CallumDev

@CallumDev This isn't only the case on AMD hardware; this storage format conversion will also happen with Nvidia, etc and is also very expensive, it definitely needs to be made faster because in some cases synchronization cannot be avoided. It's been on my radar for a while; but if you would like to explore and work on this you are welcome to, or I will get to it eventually.

mborgerson avatar Sep 19 '21 20:09 mborgerson

More notes:

  • For mesa, downloading the surface seems to be just as quick as downloading any other, upload is the problem.

I've had success uploading stencil data by setting the GL state so the func is GL_REPLACE, GL_REPLACE, GL_REPLACE, disabling color mask and using this shader with a full screen quad to sample from a R32UI tex (min and mag filters must also be set to GL_NEAREST). gl_FragDepth is untested, that's just a guess at this point. As for integrating into xemu, I'm concerned about trampling all the GL state inadvertently.

GL_ARB_shader_stencil_export is required to support this as well which only seems to be supported on AMD and Intel - not nVidia. However with reports of Azurik running ok on nVidia, perhaps the performance hit is much less for them?

#version 440
#extension GL_ARB_shader_stencil_export : require

in vec2 uv;
uniform usampler2D depthstencil_tex;

void main(void) {
    uint sval = texture(depthstencil_tex, uv).r;
    gl_FragStencilRefARB = int(sval & 0xFFu);
    gl_FragDepth = float(sval >> 8) / 16777215.0;
}

Using the shader to upload avoids the huge FPS drop (6 fps for one 1024x768 surface in my test case).

CallumDev avatar Sep 20 '21 07:09 CallumDev

Also effects: https://xemu.app/titles/43430002/#Steel-Battalion https://xemu.app/titles/43430009/#Steel-Battalion-Line-of-Contact https://xemu.app/titles/41560009/#Rally-Fusion-Race-of-Champions

EDIT: As for the Nvidia comment, I can confirm that while there is definitely a negative performance impact on Nvidia it's not nearly as bad on Nvidia, but still bad. Azurik, Steel Battalion, and Rally Fusion run worse on AMD than Nvidia, but still are not running nearly as well as hardware on either AMD or Nvidia, on either Windows or Linux

HadetTheUndying avatar May 27 '22 00:05 HadetTheUndying

In the clear case, we can be smarter about not uploading if we are about to do a full surface clear

mborgerson avatar May 27 '22 00:05 mborgerson

Issue present in: https://xemu.app/titles/45410042/#007-Everything-or-Nothing

ghost avatar Jun 01 '22 01:06 ghost

This issue may be resolved by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18484?commit_id=4da147a02b541311e8dc231b30dd36fafea820ff

TODO: Test when this makes it into a stable mesa release (22.3)

CallumDev avatar Sep 15 '22 03:09 CallumDev

Wonder if this might also help #777 ?

dekay avatar Sep 15 '22 14:09 dekay

This issue may be resolved by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18484?commit_id=4da147a02b541311e8dc231b30dd36fafea820ff

TODO: Test when this makes it into a stable mesa release (22.3)

I suppose one of us could try building with this commit and see if it resolves it. I might mess with it Monday. It would definitely be great if it does because the number of games effected by this issue has increased over time.

HadetTheUndying avatar Sep 23 '22 23:09 HadetTheUndying

image

I can confirm wtih 22.3-devel that the performance of at least Azurik: Rise of Perathia is much improved (previously it was <1fps). In less complex areas it will hit 30fps.

CPU: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
OS_Version: Fedora Linux 36 (Workstation Edition)
GL_VENDOR: Intel
GL_RENDERER: Mesa Intel(R) Xe Graphics (TGL GT2)
GL_VERSION: 4.6 (Core Profile) Mesa 22.3.0-devel
GL_SHADING_LANGUAGE_VERSION: 4.60

CallumDev avatar Sep 26 '22 06:09 CallumDev

On Mesa 22.3.0-devel, Halo 2 splitscreen is no longer 1 fps on my machine. However, the graphics are messed up, so split screen is not playable as yet. See #1237 halo2bug halo2bug1 halo2bug2

gamrXerus avatar Oct 02 '22 14:10 gamrXerus

Built mesa 22.0.5 with "speed up glTexImage" patch on ubuntu 22.04. I tried Panzer Dragoon Orta, which before ran below 10fps, and now it is almost locked 60fps in my xeon e5-1270 v3 system. Very impressive stuff.

resadent avatar Oct 21 '22 19:10 resadent

I just tested Panzer Dragoon Orta with mesa 22.3.0-1 on arch and the speedup is crazy good, just as reported by @resadent. I suggest closing this and #777 as well.

dekay avatar Dec 12 '22 03:12 dekay

This shouldn't be closed until mesa-22.3.1 is in release since that's considered the first stable release.

EDIT: It doesn't fix intel or nvidia's issues either.

I just tested Panzer Dragoon Orta with mesa 22.3.0-1 on arch and the speedup is crazy good, just as reported by @resadent. I suggest closing this and #777 as well.

HadetTheUndying avatar Dec 12 '22 04:12 HadetTheUndying

Issue is still present in Steel Battalion: Line of Contact even with the latest mesa-22.3.2. I think this issue should remain open until all effected games are confirmed working. It's possible the issue with LoC is no longer related to this issue but i don't have time to profile it right now.

EDIT: Also as of now Jan 21st with the lastest Mesa HEAD the issue is still preset so and Panzer Dragoon does still have major slowdowns dropping all the way down to 7FPS.

HadetTheUndying avatar Jan 01 '23 03:01 HadetTheUndying