flycast icon indicating copy to clipboard operation
flycast copied to clipboard

[Bisected] Commit has broken graphics on Intel XE (MESA)

Open vanfanel opened this issue 1 year ago • 20 comments

Platform / OS / Hardware: Debian Linux (Testing and Stable)

Flycast version: GIT code, the bug is about an specific commit

Hardware: Intel XE graphics integrated on i5-1235U

Description of the Issue

This commit has broken graphics on Intel XE hardware (MESA, latest stable): https://github.com/flyinghead/flycast/commit/4d73cc8e13424072c7458fd2e965c7fda9a5425a

For example: -Daytona USA 2001: VMU menu heavily garbled, in-game graphics with missing elements like car parts, etc -DOA2: Missing hair on some characters during the intro/demo. -Jet Grind Radio: Runs veeeeery slow and main character is missing.

NOTE: Use pixel-precise alpha-sorting and full framebuffer emulation to match my configuration and see the same results.

vanfanel avatar Oct 13 '24 21:10 vanfanel

So it's okay without full framebuffer emulation?

MastaG avatar Oct 13 '24 21:10 MastaG

Are you using standalone or libretro?

The mentioned commit uses a new vulkan extension when available to avoid unnecessary work. However it should only really matter when using flat shading, which isn't used by many games, and not the ones listed here. Pretending the extension exists without enabling it doesn't affect Daytona USA for example. So this could indicate a MESA issue.

flyinghead avatar Oct 14 '24 08:10 flyinghead

@flyinghead I'm using libretro. MESA is latest stable but I have also tried with latest MESA gitlab code, with the same results. If someone tests this on Intel graphics (ANV) they will see the same for sure.

@MastaG Full Framebuffer Emulation OFF causes Daytona to be invisible, no graphics at all. That's the only difference it makes for the games I tested.

vanfanel avatar Oct 14 '24 09:10 vanfanel

Can you test standalone? I spotted a new validation error due to this commit on the libretro core (causing retroarch to crash at start up here). But it only affects the core.

flyinghead avatar Oct 14 '24 09:10 flyinghead

@flyinghead I have noticed that the mentioned problems don't happen if I set "Alpha Sorting" to "Per Triangle (Normal)" instead of "Per Pixel", but that means Daytona has missing parts on the player cart, bad clouds on the Soul Calibur intro, etc. (ie: all the problems expected that are caused by the lack of per-pixel alpha sorting).

As for standalone, I will do a build and test tonight, now I have to depart for work.

vanfanel avatar Oct 14 '24 09:10 vanfanel

I fixed the error that was crashing retroarch at start up. I also removed the use of the new extension where it's not necessary. This could help the issue.

flyinghead avatar Oct 14 '24 15:10 flyinghead

@flyinghead I did an standalone build with the latest GIT code (including your latest commits of course) and the problem is present there, too. So, yes: standalone is equally affected. And your latest commit didn't fix it, sadly.

vanfanel avatar Oct 14 '24 21:10 vanfanel

Could you try to disable the use of the new extension completely? To do this, modify line 467 in core/rend/vulkan/vulkan_context.cpp to read:

			// Enable VK_EXT_provoking_vertex if available
			provokingVertexSupported = false;

This is for standalone. For the libretro core, modify line 181 in core/rend/vulkan/vulkan_context_lr.cpp:

		// Enable VK_EXT_provoking_vertex if available
		VulkanContext::Instance()->provokingVertexSupported = false;

flyinghead avatar Oct 15 '24 08:10 flyinghead

@flyinghead I did the changes you pointed me to, and the same problems persist, sorry.

vanfanel avatar Oct 15 '24 09:10 vanfanel

Tested latest build with mesa (nouveau, ubuntu 20 distrib) and no such issue (except that it's extremely slow but no surprise here).

flyinghead avatar Oct 15 '24 17:10 flyinghead

I don't have NVIDIA hardware, the issue only happens on MESA + Intel XE, as far as I know.

vanfanel avatar Oct 15 '24 19:10 vanfanel

@flyinghead Did you try setting Alpha Shorting to "Per Pixel"? With the other Alpha Shorting settings, the problem doesn't happen. So, could you try with MESA + Alpha Shorting to "Per Pixel", please?

vanfanel avatar Oct 15 '24 19:10 vanfanel

yes, and also tried per triangle.

flyinghead avatar Oct 15 '24 19:10 flyinghead

Then it's specific to Intel or Intel XE, apparently.

vanfanel avatar Oct 15 '24 19:10 vanfanel

I guess so to but I fail to see what could be the root cause. When forcing not to use the new provoking_vertex extension (like you did in the code), there's really not much left that has changed. I looked through the remaining code again and can't find anything suspicious.

flyinghead avatar Oct 15 '24 19:10 flyinghead

But if I go back to https://github.com/flyinghead/flycast/commit/5b343562b98b222b3aeec2494613d36c15f83a3a, the resulting version works perfectly fine.

Then, if I build with https://github.com/flyinghead/flycast/commit/4d73cc8e13424072c7458fd2e965c7fda9a5425a, things break very clearly.

vanfanel avatar Oct 15 '24 19:10 vanfanel

One more test: can you comment out line 59 in core/rend/vulkan/vmallocator.cpp ?

	// Top-out at vulkan 1.1
	//allocatorInfo.vulkanApiVersion = (physicalDevice.getProperties().apiVersion >= VK_API_VERSION_1_1) ? VK_API_VERSION_1_1 : VK_API_VERSION_1_0;

flyinghead avatar Oct 15 '24 19:10 flyinghead

@flyinghead Yes, of course. I did that change, but the resulting core segfaults immediately when run, here I have a GDB session with a backtrace so you can see it:

retroarch: /root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h:14443: void VmaAllocator_T::ValidateVulkanFunctions(): Assertion `m_VulkanFunctions.vkGetBufferMemoryRequirements2KHR != nullptr' failed.

Thread 1 "retroarch" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
warning: 44     ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff77a4ebf in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at ./nptl/pthread_kill.c:78
#2  0x00007ffff7750c82 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff77394f0 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7739418 in __assert_fail_base (
    fmt=0x7ffff78bdca0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7ffff4863a78 "m_VulkanFunctions.vkGetBufferMemoryRequirements2KHR != nullptr", 
    file=file@entry=0x7ffff485ecf8 "/root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h", line=line@entry=14443, 
    function=function@entry=0x7ffff48636b8 "void VmaAllocator_T::ValidateVulkanFunctions()") at ./assert/assert.c:94
#5  0x00007ffff7749592 in __assert_fail (
    assertion=0x7ffff4863a78 "m_VulkanFunctions.vkGetBufferMemoryRequirements2KHR != nullptr", 
    file=0x7ffff485ecf8 "/root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h", line=14443, 
    function=0x7ffff48636b8 "void VmaAllocator_T::ValidateVulkanFunctions()")
    at ./assert/assert.c:103
#6  0x00007ffff436b5f6 in VmaAllocator_T::ValidateVulkanFunctions (this=0x5555563f1d40)
    at /root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h:14443
#7  0x00007ffff436a5f6 in VmaAllocator_T::ImportVulkanFunctions (this=0x5555563f1d40, 
--Type <RET> for more, q to quit, c to continue without paging--
    pVulkanFunctions=0x7fffffffb290)
    at /root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h:14216
#8  0x00007ffff436a071 in VmaAllocator_T::VmaAllocator_T (this=0x5555563f1d40, 
    pCreateInfo=0x7fffffffb360)
    at /root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h:14113
#9  0x00007ffff436fa4c in vmaCreateAllocator (pCreateInfo=0x7fffffffb360, 
    pAllocator=0x7ffff54bb720 <theVulkanContext+32>)
    at /root/src/libretro/flycast/core/deps/VulkanMemoryAllocator/include/vk_mem_alloc.h:16067
#10 0x00007ffff43735eb in VMAllocator::Init (this=0x7ffff54bb720 <theVulkanContext+32>, 
    physicalDevice=..., device=..., instance=...)
    at /root/src/libretro/flycast/core/rend/vulkan/vmallocator.cpp:73
#11 0x00007ffff4399ccd in VulkanContext::init (this=0x7ffff54bb700 <theVulkanContext>, 
    retro_render_if=0x555555eb7fb8)
    at /root/src/libretro/flycast/core/rend/vulkan/vk_context_lr.cpp:346
#12 0x00007ffff31bac0d in retro_vk_context_reset ()
    at /root/src/libretro/flycast/shell/libretro/libretro.cpp:1854
#13 0x00005555555de83a in drivers_init ()
#14 0x00005555555e04a0 in retroarch_main_init ()
#15 0x0000555555603485 in content_load ()
#16 0x0000555555604ab7 in task_load_content_internal.constprop ()
#17 0x00005555555e165d in rarch_main ()
#18 0x00007ffff773ad68 in __libc_start_call_main (main=main@entry=0x5555555d2d50 <main>, 
--Type <RET> for more, q to quit, c to continue without paging--
    argc=argc@entry=4, argv=argv@entry=0x7fffffffe298)
    at ../sysdeps/nptl/libc_start_call_main.h:58
#19 0x00007ffff773ae25 in __libc_start_main_impl (main=0x5555555d2d50 <main>, argc=4, 
    argv=0x7fffffffe298, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffe288) at ../csu/libc-start.c:360
#20 0x00005555555d2e01 in _start ()

vanfanel avatar Oct 15 '24 19:10 vanfanel

I should at least have tried it on my machine... Unfortunately it's a bit more complicated than that. But I have the feeling this could be the problem.

flyinghead avatar Oct 15 '24 20:10 flyinghead

@flyinghead Whatever changes you want me to try, please ask me so and I will gladly help.

vanfanel avatar Oct 15 '24 20:10 vanfanel

@flyinghead I have updated the issue description since it only happens with per-pixel alpha sorting (needed for Daytona USA, Rez, and many others)

vanfanel avatar Oct 31 '24 17:10 vanfanel

@flyinghead Should I take this to MESA? If so, can you please explain your theory about what's wrong with Per Pixel Alpha Sorting on Intel XE so I can tell them?

vanfanel avatar Nov 09 '24 10:11 vanfanel

I don't have a theory unfortunately. I initially thought it could be related to gpu memory allocation but it doesn't seem to be the case.

flyinghead avatar Nov 09 '24 18:11 flyinghead

@flyinghead Do you have Intel graphics hardware to test?

vanfanel avatar Nov 09 '24 20:11 vanfanel

yes, but not running linux

flyinghead avatar Nov 10 '24 11:11 flyinghead

yes, but not running linux

Ok, but it may be sufficient: what intel GFX hardware do you have exactly? If it's of the XE series, maybe we can compare how it works on the Windows driver vs MESA and report to the MESA guys if it fares differently on the Windows driver.

vanfanel avatar Nov 10 '24 18:11 vanfanel

I have a Intel Iris+ 640 gpu on one machine and Intel HD Graphics 620 on another one.

flyinghead avatar Nov 10 '24 18:11 flyinghead

@flyinghead Is per pixel alpha sorting working correctly on both of them?

vanfanel avatar Nov 10 '24 18:11 vanfanel

I just tested the Intel HD Graphics 620 with ubuntu 22.04 (mesa 23.2.1?) and could not reproduce the issue. (Vulkan, per pixel, Full Framebuffer Emulation) Will try with a more recent OS/mesa version.

flyinghead avatar Nov 10 '24 18:11 flyinghead

No issue with mesa 24.0.9 either.

flyinghead avatar Nov 10 '24 22:11 flyinghead