renderdoc
renderdoc copied to clipboard
Compute shader results from other threads
Description
This might be related to https://github.com/baldurk/renderdoc/issues/1038 It seems like compute shader threads that depend on the values read from other threads can't be debugged properly. For instance, if I have a compute shader that sets up a groupshared value like so:
groupshared uint gs_variable1 = 0;
[numthreads(8, 1, 1)]
void MyShader(uint groupIndex : SV_GroupIndex)
{
if(groupIndex == 0)
{
gs_variable1 = g_Buffer.Load( // ...
}
GroupMemoryBarrierWithGroupSync(); // Wait
// Other threads do things using that value
}
the value of gs_variable1 is correct for thread (0, 0, 0) but not for the rest of the threads, which don't go through that codepath, meaning the rest of the execution can't be debugged.
Environment
- RenderDoc build: 1.1
- Operating System: Windows 7 64 bits
- API: D3D11
Yeh as I mentioned in #1038 the simulation of compute shaders is done with only a single thread running, so you'll never get the results of any other thread's execution.
The main problem is you have a huge multiplying effect on the time taken to simulate. Even if I multithreaded the execution that's only a small improvement.
Potentially it's something I could add behind an opt-in toggle for people who want to pay the cost but I'm unsure of how useful it would be if it takes ages to run... I don't like adding options to let users shoot themselves in the foot.
It would be useful to me, that's for sure. However I understand it might not be a very commonly requested feature.
As a workaround is there a way we could edit the value of a register to fake the other thread's execution? Meaning I could jump to where I want to debug, then set the register I know is supposed to contain a certain value and proceed from there. It would be a bit manual and only realistic for certain scenarios but definitely better than not being able to debug at all.
Someone has asked for that before to be able to test out shader fixes. The difficulty is as described in the other bug the full simulation is run at once rather than in incremental steps, so it's not easy to inject in new values.
Right now the higher priority to spend time on in this area is supporting debugging at all for new APIs.
Yeah I would agree with you it should be the focus. Thanks a lot for taking the time to clarify in any case :)
I have an implementation of simulating all the threads in the thread group across all available cpu cores that I made when I needed to debug a shader with groupshared memory.
It can take a while with a large group size, but it's time well spent if you're trying to understand what your compute shader is doing!
It only simulates a single thread group, so if threads in another group can write to a location that the simulated group reads then that won't be reflected. Ideally it would detect cases that won't be simulated correctly, but it doesn't at the moment.
It works by blocking at each thread sync instruction until all threads have arrived, and then atomic instructions are inside critical sections.
It uses deferred contexts, which turned out to be a bit of nuisance because I have to turn off the single threaded device bit.
It still puts up the cancel dialog if it runs for a certain number of instructions. Ideally I think it would have a cancel button next to the "thinking" indicator that you could press at any time, but I haven't done that.
Thanks for @bredbored 's implementation, it was very helpful for me to debug my compute shader :-) And are there any official plans to support it?