radeon_gpu_detective icon indicating copy to clipboard operation
radeon_gpu_detective copied to clipboard

100% repro TDR but without crash analysis

Open omd24 opened this issue 1 month ago • 5 comments

Hi, I have a 100% reproducible TDR, but interestingly, I can't reproduce it when Crash Analysis is connected. Do you have any idea what could cause this difference? Could Crash Analysis be masking a gpu synchronization or timing issue somehow?

omd24 avatar Nov 05 '25 13:11 omd24

Hi @omd24,

Is this behavior reproducible with Hardware Crash Analysis disabled?

Radeon Developer Panel -> Crash Analysis -> Analysis options -> uncheck the "Enable hardware crash analysis" checkbox to disable the feature before attempting to capture.

AmitBM avatar Nov 06 '25 04:11 AmitBM

Even with hardware crash analysis option disabled, the crash doesn't happen. I can only repro without connecting the crash analysis.

omd24 avatar Nov 06 '25 13:11 omd24

This behavior is something we haven’t observed during our internal testing. In fact, we’d expect the opposite: when Crash Analysis mode is enabled in the driver, it should trigger even non-fatal page faults that might not appear in production scenarios. It’s possible that the additional logic activated by Crash Analysis mode is inadvertently resolving a timing-related issue in the driver or application code - but that’s just a speculation.

AmitBM avatar Nov 10 '25 20:11 AmitBM

Yeah, I also suspect it's a timing-related thing. I also tried toggling options in Driver Experiments but only disabling shader optimizations prevents the crash. Additionally, neither the debug layer nor Dred provides any actionable info. And interestingly, enabling gpu-based validation also makes the issue disappear (hinting again toward a timing issue). So I'm basically stuck flipping individual systems and shaders on/off to see if I can narrow down what's causing it...

omd24 avatar Nov 10 '25 20:11 omd24

Update: After changing the windows TDR settings in the registry to make them less tolerant, I can get occasional blue-screen crashes when running with crash analysis enabled. Specifically, I set TdrLevel = 1 and TdrDebugMode = 0. But still no dump is created by Crash Analysis. Thinking about: considering disabling optimizations make the TDR disappear, it could also be a driver issue...

omd24 avatar Nov 15 '25 14:11 omd24