librashader icon indicating copy to clipboard operation
librashader copied to clipboard

macOS metal backend producing differing changing output

Open xzn opened this issue 9 months ago • 9 comments

https://github.com/user-attachments/assets/3d6c7cde-a557-4ee6-81dc-de3b2405e991 https://github.com/user-attachments/assets/cad64e72-1762-4c2e-ab81-57621a0583a2

Sorry for videos, but it shows the issue more clearly than images.

Using the included sf2.png in test folder,

Shaders are

anti-aliasing/smaa.slangp edge-smoothing/nnedi3/nnedi3-nns64-2x-nns32-4x-nns16-8x-rgb.slangp

respectively.

I'm using the objctest project from this repo, modified to load a texture, which is then filtered every frame in the rendering loop.

For the smaa shader, the issue can be checked and shows up in the cli as well.

For the nnedi3 one, the issue only shows up because the texture is mipmapped. If no mipmaps are generated for the texture then the filter works like in other platforms.

xzn avatar Apr 03 '25 04:04 xzn

Does the SMAA issue occur on other platforms as well?

chyyran avatar Apr 03 '25 04:04 chyyran

No, both of the shaders shown above works on Linux, Windows, on Vulkan, OpenGL4, and D3D11 backends.

I tried getting Vulkan backend to work on macOS with MoltenVK, many of the shaders give fragmented pool error during preset initialization however but I can look again if it helps.

xzn avatar Apr 03 '25 04:04 xzn

I'll take a look in the Metal debugger and see if there's anything odd going on.

chyyran avatar Apr 03 '25 05:04 chyyran

I figured it out (by looking at the xcode shader debugger..)

smaa uses discard and i guess metal takes it literally.

Changing discard to return all zeros or changing the metal backend to load with clear color all zeros and the problem is fixed. (don't know if this is a good idea but if i have more time tomorrow i can take a look at how retroarch does it to make sure, that project is licensed under gplv3 however...)

Other backends should probably be changed to handle this the same way as well.

Side note:

Still don't know why the nnedi3 shader bugs out when source texture has mipmaps generated. At least that can be worked around for now..

xzn avatar Apr 04 '25 04:04 xzn

Okay TIL, branches in shader with texture sampling will have undefined mipmap selection, which is why the shader gets weird when input texture has mipmap generated. I also checked the readme of this project and realized it mentions mipmap is never used for the first input texture anyway, so that's an usage error on my part.

xzn avatar Apr 05 '25 00:04 xzn

Sorry for another comment: I just checked retroarch and smaa is visibly bugged there as well on macos/metal. As smaa is not the only shader using discard, either the shaders using discard should be updated, or the filter code updated to accommodate them. (so far smaa is the only one glitching, so maybe just update that?)

xzn avatar Apr 05 '25 01:04 xzn

This would be considered an upstream bug. I'll report it to the shader maintainers. Ideally libretro would be bug for bug compatible although I do make arbitrary exceptions like the D3D12 runtime. In this case I'm wary about changing how discard works for one runtime as the other runtimes also work with discard.

chyyran avatar Apr 07 '25 00:04 chyyran

As for the mipmaps, I can add some docs to indicate that the input texture should not have mipmaps.

chyyran avatar Apr 07 '25 01:04 chyyran

Good idea.

Just to clarify it appears to be how Apple Silicon's GPUs react to discard (Vulkan + MoltenVK glitches the same way).

Also I managed to get MoltenVK to work without VK_ERROR_FRAGMENTED_POOL by setting MVK_CONFIG_USE_METAL_ARGUMENT_BUFFERS=0 (despite what MoltenVK's doc says using argument buffer actually made things worse here it seems.)

xzn avatar Apr 07 '25 02:04 xzn

SPIR-V cross maps a GLSL discard (or really SPIR-V OpKill) to discard_fragment(); return out;, see https://github.com/KhronosGroup/SPIRV-Cross/pull/2436

It seems this is fixed by returning zeroes in the GLSL, see https://discord.com/channels/184109094070779904/584075942276628481/1358614139907342407

I could write a pass to change OpKill to OpReturnValue but this lowering pass is non trivial... ideally this would be fixed upstream.

chyyran avatar Jun 20 '25 04:06 chyyran