rtx-remix icon indicating copy to clipboard operation
rtx-remix copied to clipboard

[Runtime Bug]: Frequently occuring deadlock since RTX Remix v1.0.0

Open onnoj opened this issue 11 months ago • 5 comments

Describe the bug

Since RTX Remix v1.0.0, there seems to be a deadlock affecting at least Deus Ex and the Echelon Renderer.

The d3d9 Present() call never returns and the application hangs. On the RTX Remix runtime, there seems to be a deadlock.

Image dxvk-cs and dxvk-submit seem to share the same lock and are blocked, although I don't know which thread holds the lock. I speculate that the main thread seems to wait for dxvk-submit to finish.

How do you reproduce the bug?

  1. Install Deus Ex from Steam
  2. Install https://github.com/onnoj/DeusExEchelonRenderer/releases/tag/v0.3.15 (see readme.md for instructions)
  3. (Optionally, but strongly recommended), install Deus Exe (also linked on the renderer github page)

The bug seems to be easier to reproduce if there is user-input. ~The bug doesn't seem to happen if rtx.initializer.asyncShaderFinalizing and/or rtx.initializer.asyncShaderPrewarming are set to false, but, I haven't been able to verify this fully yet.~ This is not the case, the issue still occurs with these features turned off. However, the issue takes much longer to appear if rtx.opacityMicromap.enable is set to false.

What is the expected behavior?

The runtime should not stall/block on Present. Perhaps a device-lost kind of mechanism can be added in the future, so that the game can recover and re-create the renderer.

Version

v1.0.0

Logs

(later run)

Crash dumps

Dump file seems to big (fails to upload), will be happy to email a link!

(later run)

Media

No response

onnoj avatar Mar 13 '25 19:03 onnoj

Thanks for reaching out! We've filed REMIX-4002 for internal tracking.

NV-LL avatar Mar 13 '25 20:03 NV-LL

Hi,

In the mean time, it seems that it's actually a device hang, and the aforementioned deadlock is just collateral damage.

  • The issue seems to happen less often when rtx.opacityMicromap.enable is set to false
  • I've attached an Aftermath dmp
  • I've attached log files obtained via sysinternals dbgview.exe, while running the game with vulkan validation layers enabled.

onnoj avatar Mar 14 '25 19:03 onnoj

~It's a mediocre workaround, but the issue no longer seems to occur with 1ms sleep added before a present call.~ ~That version is v0.3.16~

~The version with the original issue (and without any workarounds) is v0.3.15~ Never mind, issue still occurs, same as v0.3.15

onnoj avatar Mar 16 '25 12:03 onnoj

I noticed that when the deadlock occurs, there are various messages written to the windows event log:

  • System (source: nvlddmkm)

The description for Event ID 153 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: \Device\0000010f Error occurred on GPUID: d00 The message resource is present but the message was not found in the message table

  • Application, various of:

Fault bucket , type 0 Event Name: LiveKernelEvent Response: Not available Cab Id: 0

Problem signature: P1: 141 P2: ffffac0dd6436010 P3: fffff80492eb4580 P4: 0 P5: ffffac0dce404080 P6: 10_0_26100 P7: 0_0 P8: 256_1 P9: P10:

Fault bucket , type 0 Event Name: LiveKernelEvent Response: Not available Cab Id: 0 Problem signature: P1: 141 P2: ffff920c9e9ac1d0 P3: fffff80628b663e0 P4: 0 P5: ffff920c9fb25080 P6: 10_0_26100 P7: 0_0 P8: 256_1 P9: P10:

Associated dumps show fault in nvlddmkm.sys, I have hardware-accelerated gpu scheduling turned on, as well as HVCI.

onnoj avatar Mar 21 '25 18:03 onnoj

nvkernelcrashes.zip

onnoj avatar Mar 21 '25 18:03 onnoj