renderdoc icon indicating copy to clipboard operation
renderdoc copied to clipboard

Allow inspecting groupshared memory in D3D11 shader debugging

Open redorav opened this issue 7 years ago • 7 comments

Description

I don't know whether this is implemented already and I just don't know how to use it or a genuine bug, so it's more of a question-bug.

The first issue is seeing values such as vThreadID or vThreadGroupID in the compute shader debugger. I think other semantics aren't easily visible either (i.e. the instance id) although from context are more easily deducible. Hovering on top of the name or seeing it in Variables (or both) would be very useful.

The other issue is also related to data visibility, and it's seeing groupshared memory as a compute shader runs. I understand that maybe the contents of memory from the full run can't be inspected (as in, it wouldn't show things like race conditions, etc.) but at least the value that the current thread is writing would be quite useful. Perhaps it would appear in the variables window as well like the registers or in the Constants && Resources.

One other question that is related is how to use the Watch window. Is there a way to add anything other than registers? For instance, adding e.g. g0[10] for the groupshared memory to be able to observe how that particular entry evolves?

Environment

  • RenderDoc build: 1.0
  • Operating System: Windows 7
  • API: D3D11

redorav avatar Jul 10 '18 09:07 redorav

About vThreadID and friends, those weren't omitted for any good reason. I had skipped them since I'd thought of them as constants - e.g. in compute shaders you choose which group and thread to debug. I realise that's not very useful though, so I think the recent revamp to allow HLSL debugging should now display them properly, although I think I might still have a bug or two in that area.

With groupshared memory it's more tricky. To clarify; the compute debugging in RenderDoc is currently limited to running a single thread at once, so groupshared memory is simulated internally but it's kind of trivial because there aren't any reads/writes from other threads. In most cases that means debugging shaders that use groupshared will rapidly turn into nonsense. You can use the 'run to load/sample/gather' button which will also pick up loads from groupshared, but it'll only have the correct value if no other thread was expected to write to it.

Displaying the contents in the UI is the challenge. The debugging works by running the full simulation and snapshotting the state at each step. That's not too bad if you're only talking about a dozen or so registers, but if you include up to 32kb of groupshared then it gets exponentially worse.

Changing to a more iterative approach where simulation only happens as needed would solve that, but then you have issues with performance when running over again repeating work. If you cache to solve that, you start going back into the same problem. It also becomes harder to support stepping backwards which I think is a very useful feature especially for shaders.

I think probably the best solution is to instead store each step as a delta (either a two-way delta or with regular snapshots to allow rewind) since each individual step doesn't change much. That's going to need a very different representation though. Since I'm hoping to look at Vulkan shader debugging later in the year, I'll need to refactor a lot of this code anyway to support both and it would make sense to re-examine this at that time. For the moment though I think groupshared is probably going to remain opaque.

baldurk avatar Jul 10 '18 13:07 baldurk

I see you've addressed the thread id/group quite quickly so thanks for that!

Regarding the groupshared thanks for the explanation, I can better understand now how it can be quite tricky to implement for the entire shader run in the current implementation. Hopefully when you get to getting Vulkan debugging done it can be piggybacked on that task.

redorav avatar Jul 11 '18 21:07 redorav

Oh groupshared, how I long for thee. The doctor of rendering I beseech.

Temaran avatar Sep 12 '18 03:09 Temaran

Even being able to edit registers and the groupshared memory manually to simulate some shader state would be extremely useful tbh, irregardless of more complex solutions.

Temaran avatar Sep 12 '18 03:09 Temaran

Editing registers and groupshared memory would be much more work on top of everything mentioned above and opens a lot of other UI pains: if you step forward, edit a value, then step back past where you edited and run again - does it use the edited value or the actual value? It's a sort of microcosm of the kind of semantic difficulties for state editing.

You can hack in groupshared contents if you know what they should be. In D3D11DebugManager::CreateShaderGlobalState each groupshared declaration creates a memory buffer that's initially empty:

https://github.com/baldurk/renderdoc/blob/c9c51b5a3eb16c8fe07234fe7feedc58b6153bc1/renderdoc/driver/d3d11/d3d11_shaderdebug.cpp#L600-L615

You could fill that with data at the start if you know what it should be for other threads maybe. Or else you can poke into it in the main loop of D3D11Replay::DebugThread if you know which step or instruction you want to have the results appear:

https://github.com/baldurk/renderdoc/blob/c9c51b5a3eb16c8fe07234fe7feedc58b6153bc1/renderdoc/driver/d3d11/d3d11_shaderdebug.cpp#L2354-L2374

baldurk avatar Sep 13 '18 15:09 baldurk

Oh, that is pretty sweet! With editing I meant directly modifying the one and only state of the shader simulator, so I wouldn't expect the shader to behave as expected if I started rewinding it etc. This could easily be indicated by marking the shader state as dirty as soon as you elect to edit, disabling the backwards button for example. It would be super powerful and simple way to do more creative debugging, especially with groupshared stuff.

I will most definitely look into D3D11DebugManager::CreateShaderGlobalState though, thanks for the tip!!

Temaran avatar Sep 13 '18 20:09 Temaran

With groupshared memory it's more tricky. To clarify; the compute debugging in RenderDoc is currently limited to running a single thread at once

Does the same apply to pixel shaders? Thinking of derivative instructions, visualizing the value of a variable across invocations etc.

Trass3r avatar Sep 04 '23 11:09 Trass3r