DirectXShaderCompiler
DirectXShaderCompiler copied to clipboard
[SPIR-V] Presence of vk::RawBufferLoad turns RWStructuredBuffer into AppendConsumerBuffer
This is a bit of a strange one, but I've been able to narrow it down to a very simple case. If I have vk::RawBufferLoad<> present in the shader then any RWStructureBuffer is transformed into an AppendConsumeBuffer. Repro case:
struct PS_OUTPUT
{
float4 vColor : SV_Target0;
};
RWStructuredBuffer < uint > g_Test ;
PS_OUTPUT MainPs ( void )
{
PS_OUTPUT o;
bool bTest = vk::RawBufferLoad<bool>( 0 );
g_Test[ 0 ] = 1;
o.vColor = float4( 0, 0, 0, 0 );
return o;
}
dxc.exe -spirv -T ps_6_1 -E MainPs -fspv-target-env=vulkan1.2 test.hlsl produces:
; SPIR-V
; Version: 1.5
; Generator: Google spiregg; 0
; Bound: 41
; Schema: 0
OpCapability Shader
OpCapability Int64
OpCapability PhysicalStorageBufferAddresses
OpExtension "SPV_KHR_physical_storage_buffer"
OpMemoryModel PhysicalStorageBuffer64 GLSL450
OpEntryPoint Fragment %MainPs "MainPs" %out_var_SV_Target0 %g_Test %counter_var_g_Test
OpExecutionMode %MainPs OriginUpperLeft
OpSource HLSL 610
OpName %type_RWStructuredBuffer_uint "type.RWStructuredBuffer.uint"
OpName %g_Test "g_Test"
OpName %type_ACSBuffer_counter "type.ACSBuffer.counter"
OpMemberName %type_ACSBuffer_counter 0 "counter"
OpName %counter_var_g_Test "counter.var.g_Test"
OpName %out_var_SV_Target0 "out.var.SV_Target0"
OpName %MainPs "MainPs"
OpName %PS_OUTPUT "PS_OUTPUT"
OpMemberName %PS_OUTPUT 0 "vColor"
OpDecorate %out_var_SV_Target0 Location 0
OpDecorate %g_Test DescriptorSet 0
OpDecorate %g_Test Binding 0
OpDecorate %counter_var_g_Test DescriptorSet 0
OpDecorate %counter_var_g_Test Binding 1
OpDecorate %_runtimearr_uint ArrayStride 4
OpMemberDecorate %type_RWStructuredBuffer_uint 0 Offset 0
OpDecorate %type_RWStructuredBuffer_uint Block
OpMemberDecorate %type_ACSBuffer_counter 0 Offset 0
OpDecorate %type_ACSBuffer_counter Block
%ulong = OpTypeInt 64 0
%ulong_0 = OpConstant %ulong 0
%uint = OpTypeInt 32 0
%uint_0 = OpConstant %uint 0
%uint_1 = OpConstant %uint 1
%int = OpTypeInt 32 1
%int_0 = OpConstant %int 0
%float = OpTypeFloat 32
%float_0 = OpConstant %float 0
%v4float = OpTypeVector %float 4
%19 = OpConstantComposite %v4float %float_0 %float_0 %float_0 %float_0
%_runtimearr_uint = OpTypeRuntimeArray %uint
%type_RWStructuredBuffer_uint = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer_type_RWStructuredBuffer_uint = OpTypePointer StorageBuffer %type_RWStructuredBuffer_uint
%type_ACSBuffer_counter = OpTypeStruct %int
%_ptr_StorageBuffer_type_ACSBuffer_counter = OpTypePointer StorageBuffer %type_ACSBuffer_counter
%_ptr_Output_v4float = OpTypePointer Output %v4float
%void = OpTypeVoid
%24 = OpTypeFunction %void
%PS_OUTPUT = OpTypeStruct %v4float
%25 = OpTypeFunction %PS_OUTPUT
%_ptr_Function_PS_OUTPUT = OpTypePointer Function %PS_OUTPUT
%bool = OpTypeBool
%_ptr_Function_bool = OpTypePointer Function %bool
%_ptr_PhysicalStorageBuffer_uint = OpTypePointer PhysicalStorageBuffer %uint
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
%_ptr_Function_v4float = OpTypePointer Function %v4float
%g_Test = OpVariable %_ptr_StorageBuffer_type_RWStructuredBuffer_uint StorageBuffer
%counter_var_g_Test = OpVariable %_ptr_StorageBuffer_type_ACSBuffer_counter StorageBuffer
%out_var_SV_Target0 = OpVariable %_ptr_Output_v4float Output
%32 = OpConstantComposite %PS_OUTPUT %19
%MainPs = OpFunction %void None %24
%33 = OpLabel
%34 = OpVariable %_ptr_Function_v4float Function
%35 = OpVariable %_ptr_Function_v4float Function
%36 = OpVariable %_ptr_Function_bool Function
%37 = OpBitcast %_ptr_PhysicalStorageBuffer_uint %ulong_0
%38 = OpLoad %uint %37 Aligned 4
%39 = OpINotEqual %bool %38 %uint_0
OpStore %36 %39
%40 = OpAccessChain %_ptr_StorageBuffer_uint %g_Test %int_0 %uint_0
OpStore %40 %uint_1
OpStore %35 %19
OpStore %34 %19
OpStore %out_var_SV_Target0 %19
OpReturn
OpFunctionEnd
Note the ACSBuffer_counter is generated even though g_Test is a RWSTructuredBuffer. Now, if I take the same shader and simply remove the vk::RawBufferLoad<> call like so:
struct PS_OUTPUT
{
float4 vColor : SV_Target0;
};
RWStructuredBuffer < uint > g_Test ;
PS_OUTPUT MainPs ( void )
{
PS_OUTPUT o;
// REMOVED
//bool bTest = vk::RawBufferLoad<bool>( 0 );
g_Test[ 0 ] = 1;
o.vColor = float4( 0, 0, 0, 0 );
return o;
}
The the RWSTructureBuffer doesn't have the ACSBuffer_counter as expected:
; SPIR-V
; Version: 1.5
; Generator: Google spiregg; 0
; Bound: 22
; Schema: 0
OpCapability Shader
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %MainPs "MainPs" %out_var_SV_Target0 %g_Test
OpExecutionMode %MainPs OriginUpperLeft
OpSource HLSL 610
OpName %type_RWStructuredBuffer_uint "type.RWStructuredBuffer.uint"
OpName %g_Test "g_Test"
OpName %out_var_SV_Target0 "out.var.SV_Target0"
OpName %MainPs "MainPs"
OpDecorate %out_var_SV_Target0 Location 0
OpDecorate %g_Test DescriptorSet 0
OpDecorate %g_Test Binding 0
OpDecorate %_runtimearr_uint ArrayStride 4
OpMemberDecorate %type_RWStructuredBuffer_uint 0 Offset 0
OpDecorate %type_RWStructuredBuffer_uint Block
%uint = OpTypeInt 32 0
%uint_1 = OpConstant %uint 1
%int = OpTypeInt 32 1
%int_0 = OpConstant %int 0
%uint_0 = OpConstant %uint 0
%float = OpTypeFloat 32
%float_0 = OpConstant %float 0
%v4float = OpTypeVector %float 4
%14 = OpConstantComposite %v4float %float_0 %float_0 %float_0 %float_0
%_runtimearr_uint = OpTypeRuntimeArray %uint
%type_RWStructuredBuffer_uint = OpTypeStruct %_runtimearr_uint
%_ptr_StorageBuffer_type_RWStructuredBuffer_uint = OpTypePointer StorageBuffer %type_RWStructuredBuffer_uint
%_ptr_Output_v4float = OpTypePointer Output %v4float
%void = OpTypeVoid
%18 = OpTypeFunction %void
%_ptr_StorageBuffer_uint = OpTypePointer StorageBuffer %uint
%g_Test = OpVariable %_ptr_StorageBuffer_type_RWStructuredBuffer_uint StorageBuffer
%out_var_SV_Target0 = OpVariable %_ptr_Output_v4float Output
%MainPs = OpFunction %void None %18
%20 = OpLabel
%21 = OpAccessChain %_ptr_StorageBuffer_uint %g_Test %int_0 %uint_0
OpStore %21 %uint_1
OpStore %out_var_SV_Target0 %14
OpReturn
OpFunctionEnd
The presence of vk::RawBufferLoad<> should not change the type of the RWStructuredBuffer into an AppendConsumeBuffer.
I'm out of my element, but have also been looking into using vk::RawBufferLoad. I wonder if this is because using the "SPV_KHR_physical_storage_buffer" and the PhysicalStorageBufferAddresses capability promotes the addressing model to PhysicalStorageBuffer64?
I will take a look at this shortly. If someone else starts looking at it, please let me know.
At first glance, I noticed that while the ACS counter is "present" in the first SPIR-V, it is not used. There is no code that reads or writes it, it is not a member of the entrypoint interface and it does not have a descriptor set or binding assigned to it.
While I totally agree this is incorrect and should be fixed, I am curious how this is causing you a problem so that I know when it is fixed.
I am zeroing in on the problem. DXC (no legalization) is generating the ACS counter for both versions of the shader, putting it in the interface and assigning descriptor sets and binding although it is not read or written in either version of the shader.
For the shader without the RawBufferLoad, the counter is completely optimized away by spirv-opt legalization. For the shader with the RawBufferLoad, some of the counter code is eliminated, and some is left.
So it looks like there is some issue in spirv-opt, probably ADCE, caused by the presence of the RawBufferLoad. Still need to verify that DXC (no legalization) is completely doing the right thing.
There seems to be two solutions: 1) change DXC (no legalization) to not create the counter at all, or 2) fix spirv-opt so that all traces of the counter variable are removed by legalization when it is not used, even in the presence of RawBufferLoad.
I think 1) is the slightly better solution, so I will pursue that.
While I totally agree this is incorrect and should be fixed, I am curious how this is causing you a problem so that I know when it is fixed.
Thanks, @greg-lunarg. Append/Consume buffers are strange in that each consumes two descriptors: one for the buffer and one for the counter. The problem for us is as follows:
- When we are creating an append/consume buffer, we create a separate buffer for the counter and bind it in our descriptor set.
- We know whether an append/consume buffer is used by doing reflection on the SPIR-V (there is a convention to look for "counter.var." and that associates the counter with the ACSBuffer.
- In this case - we have SPIR-V reflection telling us "this is an Append/Consume Buffer" but our runtime code does not allocate it as such. So our code that would normally bind the hidden counter to the separate buffer doesn't execute, and we end up with trying to do vkUpdateDescriptorSets without a valid buffer for the counter descriptor, causing a crash.
More succintly: our code requires that if you use an Append/ConsumeBuffer you allocate it with a special flag. This code generation problem makes the two disagree.
Either of the solution you propose would work.
Hi @greg-lunarg, I can confirm your patch fixes the issue. Thanks!
Some additional notes: DXC as it is today always creates a counter for RWStructuredBuffer. As seen above, this screws up descriptor allocation and reflection for those apps that do not use the counter.
However, there are two methods for RWStructuredBuffer, Increment/DecrementCounter() which require a counter to be created.
So I am going to need to add logic to DXC which provisionally creates a counter based on these conditions.
Currently arrays of RWStructuredBuffer are not allowed for spirv output for (I assume) this exact reason. The reason stated in another issue was that the creation and tracking for the counters for these arrays rwstructuredbuffers would be very complex and thats why it was not implemented initially. Yet i would assume that many dont use these counters at all, so it would be a lot better when the counters would not be created when increment and decrement are not used. I would also be fine with them beeing straight up not allowed for spirv output (on array bindings for RWStructuredBuffer) for the time beeing, but enabling using RWStructuredBuffer s in descriptor arrays for spirv output in general. Also this issue is related to this: https://github.com/microsoft/DirectXShaderCompiler/issues/3281.
My code now defers creating the counter until Inc/DecrementCounter methods are invoked. If the methods are not invoked, the counter is not created.
I have not tested if this helps #3281. I suspect the solution for that can build on this one. I am imagining that the solution could be to allow arrays of RWStructuredBuffer as long as no Inc/DecrementCounter is called.