rtx-remix icon indicating copy to clipboard operation
rtx-remix copied to clipboard

[Runtime Bug]: RTX Remix hangs on startup

Open Nukem9 opened this issue 1 year ago • 4 comments

Describe the bug

A bug happens where RTX Remix games hang with a black screen on startup.

How do you reproduce the bug?

  • Install the latest version of Portal RTX or Portal Prelude RTX.
  • Install Nvidia driver version 560.81.
  • Replace the existing RTX Remix runtimes with files from dxvk-remix-3c9c2ed-609-debugoptimized and bridge-remix-2d8aa92-87-debugoptimized.
  • Launch Portal RTX and/or Portal Prelude RTX.
  • Both games hang on startup with a black screen. I terminate the processes after 5 minutes.

(For this issue I'm using the latest driver and CI builds for symbols, but it's reproducible on older 560.XX drivers and older Remix runtimes. This is not exclusive to Portal either.)

What is the expected behavior?

RTX Remix games should reach the main menu after a few seconds.

Version

0.5.4+3c9c2ed9

Logs

hl2_d3d9.log d3d9.log NvRemixBridge.log

Crash dumps

NvRemixBridge.dmp stackdump.txt

Note: this isn't a crash. The attached dump was manually initiated through windbg.

Media

No response

Nukem9 avatar Aug 08 '24 19:08 Nukem9

I can't seem to repro this, trying it myself I can load Portal RTX just fine. Based on the dump you provided though it seems stuck in the Vulkan driver trying to create a render target image view for the first time when requested to clear the screen, at least assuming it's actually stuck here and this isn't just a red herring:

 	ntdll.dll!NtWaitForAlertByThreadId()	Unknown
 	ntdll.dll!RtlAcquireSRWLockExclusive()	Unknown
 	nvoglv64.dll!00007ff9c5409ae7()	Unknown
 	nvoglv64.dll!00007ff9c5de8410()	Unknown
 	nvoglv64.dll!00007ff9c5de79ad()	Unknown
 	d3d9.dll!dxvk::DxvkImageView::createView(VkImageViewType type, unsigned int numLayers) Line 360	C++
 	d3d9.dll!dxvk::DxvkImageView::DxvkImageView(const dxvk::Rc<dxvk::vk::DeviceFn> & vkd, const dxvk::Rc<dxvk::DxvkImage> & image, const dxvk::DxvkImageViewCreateInfo & info) Line 297	C++
 	d3d9.dll!dxvk::DxvkDevice::createImageView(const dxvk::Rc<dxvk::DxvkImage> & image, const dxvk::DxvkImageViewCreateInfo & createInfo) Line 280	C++
 	d3d9.dll!dxvk::D3D9CommonTexture::CreateView(unsigned int Layer, unsigned int Lod, unsigned int UsageFlags, bool Srgb) Line 591	C++
 	[Inline Frame] d3d9.dll!dxvk::D3D9Subresource<IDirect3DSurface9>::GetRenderTargetView(bool) Line 93	C++
 	d3d9.dll!dxvk::D3D9DeviceEx::Clear::__l2::<lambda>(unsigned int alignment, VkOffset3D offset, VkExtent3D extent) Line 1642	C++
>	d3d9.dll!dxvk::D3D9DeviceEx::Clear(unsigned long Count, const _D3DRECT * pRects, unsigned long Flags, unsigned long Color, float Z, unsigned long Stencil) Line 1664	C++

Are there any other details regarding your environment, like is this Windows 10 or 11 as that might make some difference.

anon-apple avatar Aug 09 '24 17:08 anon-apple

I'll add in advance - I'm always hesitant to open these types of issues because there's a nonzero chance you'll be chasing down ghosts. Hopefully this helps anyway.

If you take a peek at stackdump.txt you'll notice two active critical sections (CS locks): 0000025fb0bf8080 and 00007ff9cd64d750:

  • Thread ID 2A00 ("dxvk-shader") holds both of the aforementioned CS locks. At the same time, this thread is waiting on an exclusive SRW lock.
  • Thread ID 3208 ("dxvk-submit") is stuck in RtlpEnterCriticalSectionContended. Presumably it's waiting on one of the two CS locks. I don't know which.
  • For one reason or another, said SRW lock is never released, preventing thread 2A00 from making forward progress. Therefore thread 3208 can't make progress either which leads to a cascading effect.
  • I'm assuming thread 3208 holds that SRW lock and it's a total deadlock in driver code, but I can't definitively prove this.
  • Every other thread is likely a red herring, including the Clear call.

Environment:

  • Windows Server 2022 (Windows 10 Version 20348 MP (12 procs) Free x64 in attached .dmp)
  • Intel i7-6850K
  • RTX 2080 Ti

Other notes:

  • Having background programs open reduces the chance of NvRemixBridge hanging.
  • Deleting the Vulkan shader cache (GLCache) reduces the chance of NvRemixBridge hanging.
  • Running NvRemixBridge under a debugger eliminates the hangs entirely.
  • Setting NvRemixBridge's CPU affinity to 1 CPU core eliminates the hangs entirely.
  • Adding a Sleep() call before the first frame is presented eliminates the hangs entirely. This allows pipeline creation to finish beforehand.

Nukem9 avatar Aug 09 '24 19:08 Nukem9

Yeah based on that it feels like it's possibly more of a driver side issue to me since it's getting stuck there (even if we were doing something wrong getting a deadlock in the driver isn't really desired behavior). We'll look into it though and report back if we can figure out what's going on a bit better.

anon-apple avatar Aug 16 '24 19:08 anon-apple

REMIX-3534 for tracking

NV-LL avatar Sep 11 '24 22:09 NV-LL